Creator: Horák, Aleš / Rights: PUB - LINDAT/CLARIAH-CZ Catalog Search Results

Creator:: Novotný, Vít, Luger, Kristýna, Štefánik, Michal, Vrabcová, Tereza, and Horák, Aleš
Publisher:: Masaryk University, Brno
Type:: text and corpus
Subject:: NER, named entity recognition, and Medieval
Language:: Czech, English, German, and Latin
Description:: This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

Creator:: Novotný, Vít, Luger, Kristýna, Štefánik, Michal, Vrabcová, Tereza, and Horák, Aleš
Publisher:: Masaryk University, Brno
Type:: text and corpus
Subject:: NER, named entity recognition, and Medieval
Language:: Czech, English, German, and Latin
Description:: This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

Creator:: Novotný, Vít, Seidlová, Kristýna, Vrabcová, Tereza, and Horák, Aleš
Publisher:: Masaryk University, Brno
Type:: image and corpus
Subject:: ocr, optical character recognition, language identification, image super-resolution, sr, and Medieval
Language:: German, Czech, Latin, and English
Description:: This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

Creator:: Novotný, Vít and Horák, Aleš
Publisher:: Masaryk University, Brno
Type:: text and corpus
Subject:: ocr, optical character recognition, language identification, image super-resolution, sr, and Medieval
Language:: Czech, English, German, and Latin
Description:: These are supplementary materials for an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification and is available at http://hdl.handle.net/11234/1-4615. These supplementary materials contain OCR texts from different OCR engines for book pages for which we have both high-resolution scanned images and annotations for OCR evaluation.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

Creator:: Medveď, Marek and Horák, Aleš
Publisher:: Masaryk University, NLP Centre
Type:: text and corpus
Subject:: question answering, Simple Question Answering Database, and SQAD
Language:: Czech
Description:: The SQAD database consists of 3301 records obtained from Czech Wikipedia articles. The record structure is following: - the original sentence(s) from Wikipedia - a question that is directly answered in the text - the expected answer to the question as it appears in the original text - the URL of the Wikipedia web page from which the original text was extracted - name of the author of this SQAD record
Rights:: GNU General Public Licence, version 3, http://opensource.org/licenses/GPL-3.0, and PUB

Creator:: Medveď, Marek, Horák, Aleš, and Kušniráková, Dáša
Publisher:: Natural Language Processing Centre, Faculty of Informatics, Masaryk University
Type:: text and corpus
Subject:: Czech, Simple Question Answering Database, and question answering
Language:: Czech
Description:: Simple question answering database version 2.1 (SQAD_v2.1) created from Czech Wikipedia. Each record of SQAD consist of four files (in vertical form provided with lemmatization and POS tagging) and two metadata files.
Rights:: GNU Library or "Lesser" General Public License 3.0 (LGPL-3.0), http://opensource.org/licenses/LGPL-3.0, and PUB

Creator:: Medveď, Marek and Horák, Aleš
Publisher:: Masaryk University, NLP Centre
Type:: text and corpus
Subject:: Simple Question Answering Database, Czech, and question answering
Language:: Czech
Description:: Simple question answering database version 3 (SQAD v3) created from Czech Wikipedia. New version consits of 13477 records. Each record of SQAD consist of multiple files - question, answer extraction, answer selection, ulr, question metadata and in some cases answer context.
Rights:: GNU Library or "Lesser" General Public License 3.0 (LGPL-3.0), http://opensource.org/licenses/LGPL-3.0, and PUB

Creator:: Medveď, Marek, Horák, Aleš, and Šulganová, Terézia
Publisher:: Natural Language Processing Centre, Faculty of Informatics, Masaryk University
Type:: text and corpus
Subject:: question answering, Czech, and Simple Question Answering Database
Language:: Czech
Description:: Simple question answering database (SQAD) created from Czech Wikipedia. Each record of SQAD consist of four files (in vertical form provided with lemmatization and POS tagging) and two metadata files.
Rights:: GNU General Public Licence, version 3, http://opensource.org/licenses/GPL-3.0, and PUB

Limit your search