This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification.
TAČR@@TL03000365@@Accessible historical sources. Making medieval written documents available in the form of a contextual database@@nationalFunds@@✖[remove]1