A test statistic for homogeneity of two or more covariance matrices is presented when the distributions may be non-normal and the dimension may exceed the sample size. Using the Frobenius norm of the difference of null and alternative hypotheses, the statistic is constructed as a linear combination of consistent, location-invariant, estimators of trace functions that constitute the norm. These estimators are defined as U-statistics and the corresponding theory is exploited to derive the normal limit of the statistic under a few mild assumptions as both sample size and dimension grow large. Simulations are used to assess the accuracy of the statistic.
This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification.
These are supplementary materials for an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification and is available at http://hdl.handle.net/11234/1-4615. These supplementary materials contain OCR texts from different OCR engines for book pages for which we have both high-resolution scanned images and annotations for OCR evaluation.