This article examines the reliability of statistical models that use visualization of word distances using computer-assisted text analysis. This study looks at the choice of parameters in the COOA - software for word co-occurrence analysis. The word co-occurrence analysis enables visualization of text structure through the exploration of the number of co-occurrences of words. The data visualization provided by a multi-dimensional scaling (MDS) procedure is susceptible to a particular form of error. The nonlinear relationship between words with significantly different frequencies lies at the root of this problem where words with higher frequencies are placed in the middle of a two-dimensional MDS map visualization. Words with lower frequency, on the other hand, are forced by the MDS estimator to the edge of the two-dimensional map and their estimated spatial positions are unstable. These two processes are potentially a major source of error in making inferences. One solution for reducing this source of error is to (a) reduce the number of words in a model or (b) increase of the number of model dimensions. This article, however, suggests that a detailed investigation of the word structure and a thorough analysis of the error sources and their meaningful interpretation may be a better solution., Václav Čepelák., and Obsahuje bibliografii
This article examines the reliability of statistical models that use visualization of word distances using computer-assisted text analysis. This study looks at the choice of parameters in the COOA - software for word co-occurrence analysis. The word co-occurrence analysis enables visualization of text structure through the exploration of the number of co-occurrences of words. The data visualization provided by a multi-dimensional scaling (MDS) procedure is susceptible to a particular form of error. The nonlinear relationship between words with significantly different frequencies lies at the root of this problem where words with higher frequencies are placed in the middle of a two-dimensional MDS map visualization. Words with lower frequency, on the other hand, are forced by the MDS estimator to the edge of the two-dimensional map and their estimated spatial positions are unstable. These two processes are potentially a major source of error in making inferences. One solution for reducing this source of error is to (a) reduce the number of words in a model or (b) increase of the number of model dimensions. This article, however, suggests that a detailed investigation of the word structure and a thorough analysis of the error sources and their meaningful interpretation may be a better solution.