This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification.
These are supplementary materials for an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification and is available at http://hdl.handle.net/11234/1-4615. These supplementary materials contain OCR texts from different OCR engines for book pages for which we have both high-resolution scanned images and annotations for OCR evaluation.
Pavel Josef Šafařík ; vydal Jan Vilikovský, 1000 výt., Obsahuje bibliografické odkazy a rejstřík, and Část. staročeský, anglický, německý a latinský text
sestavil Čeněk Zíbrt., Obsahuje rejstříky., Částečně souběžný anglický, francouzský, německý, italský, latinský, polský a ruský text, and Vydává III. třída České akademie císaře Františka Josefa pro vědy, slovesnost a umění v Praze