Language: German and English / Rights: PUB - LINDAT/CLARIAH-CZ Catalog Search Results

1. A Human-Annotated Dataset for Language Modeling and Named Entity Recognition in Medieval Documents

Creator:: Novotný, Vít, Luger, Kristýna, Štefánik, Michal, Vrabcová, Tereza, and Horák, Aleš
Publisher:: Masaryk University, Brno
Type:: text and corpus
Subject:: NER, named entity recognition, and Medieval
Language:: Czech, English, German, and Latin
Description:: This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

2. A Human-Annotated Dataset for Language Modeling and Named Entity Recognition in Medieval Documents (2023-01-05)

Creator:: Novotný, Vít, Luger, Kristýna, Štefánik, Michal, Vrabcová, Tereza, and Horák, Aleš
Publisher:: Masaryk University, Brno
Type:: text and corpus
Subject:: NER, named entity recognition, and Medieval
Language:: Czech, English, German, and Latin
Description:: This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

3. A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Documents

Creator:: Novotný, Vít, Seidlová, Kristýna, Vrabcová, Tereza, and Horák, Aleš
Publisher:: Masaryk University, Brno
Type:: image and corpus
Subject:: ocr, optical character recognition, language identification, image super-resolution, sr, and Medieval
Language:: German, Czech, Latin, and English
Description:: This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

4. A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Documents: Supplementary Materials

Creator:: Novotný, Vít and Horák, Aleš
Publisher:: Masaryk University, Brno
Type:: text and corpus
Subject:: ocr, optical character recognition, language identification, image super-resolution, sr, and Medieval
Language:: Czech, English, German, and Latin
Description:: These are supplementary materials for an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification and is available at http://hdl.handle.net/11234/1-4615. These supplementary materials contain OCR texts from different OCR engines for book pages for which we have both high-resolution scanned images and annotations for OCR evaluation.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

5. Annotated corpora and tools of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (edition 1.1)

Creator:: Ramisch, Carlos, Cordeiro, Silvio Ricardo, Savary, Agata, Vincze, Veronika, Barbu Mititelu, Verginica, Bhatia, Archna, Buljan, Maja, Candito, Marie, Gantar, Polona, Giouli, Voula, Güngör, Tunga, Hawwari, Abdelati, Iñurrieta, Uxoa, Kovalevskaitė, Jolanta, Krek, Simon, Lichte, Timm, Liebeskind, Chaya, Monti, Johanna, Parra Escartín, Carla, QasemiZadeh, Behrang, Ramisch, Renata, Schneider, Nathan, Stoyanova, Ivelina, Vaidya, Ashwini, Walsh, Abigail, Aceta, Cristina, Aduriz, Itziar, Antoine, Jean-Yves, Arhar Holdt, Špela, Berk, Gözde, Bielinskienė, Agnė, Blagus, Goranka, Boizou, Loic, Bonial, Claire, Caruso, Valeria, Čibej, Jaka, Constant, Matthieu, Cook, Paul, Diab, Mona, Dimitrova, Tsvetana, Ehren, Rafael, Elbadrashiny, Mohamed, Elyovich, Hevi, Erden, Berna, Estarrona, Ainara, Fotopoulou, Aggeliki, Foufi, Vassiliki, Geeraert, Kristina, van Gompel, Maarten, Gonzalez, Itziar, Gurrutxaga, Antton, Ha-Cohen Kerner, Yaakov, Ibrahim, Rehab, Ionescu, Mihaela, Jain, Kanishka, Jazbec, Ivo-Pavao, Kavčič, Teja, Klyueva, Natalia, Kocijan, Kristina, Kovács, Viktória, Kuzman, Taja, Leseva, Svetlozara, Ljubešić, Nikola, Malka, Ruth, Markantonatou, Stella, Martínez Alonso, Héctor, Matas, Ivana, McCrae, John, de Medeiros Caseli, Helena, Onofrei, Mihaela, Palka-Binkiewicz, Emilia, Papadelli, Stella, Parmentier, Yannick, Pascucci, Antonio, Pasquer, Caroline, Pia di Buono, Maria, Puri, Vandana, Raffone, Annalisa, Ratori, Shraddha, Riccio, Anna, Sangati, Federico, Shukla, Vishakha, Simkó, Katalin, Šnajder, Jan, Somers, Clarissa, Srivastava, Shubham, Stefanova, Valentina, Taslimipoor, Shiva, Theoxari, Natasa, Todorova, Maria, Urizar, Ruben, Villavicencio, Aline, and Zilio, Leonardo
Publisher:: PARSEME
Type:: text and corpus
Subject:: Multiword expressions, verbal multiword expressions, light-verb constructions, verb-particle constructions, inherently reflexive verbs, verbal idioms, and multi-verb constructions
Language:: Bulgarian, German, Modern Greek (1453-), Spanish, Persian, French, Hebrew, Hungarian, Italian, Lithuanian, Polish, Portuguese, Romanian, Slovenian, Turkish, Hindi, Basque, English, and Croatian
Description:: This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). VMWEs were annotated according to the universal guidelines in 19 languages. The corpora are provided in the cupt format, inspired by the CONLL-U format. The corpora were used in the 1.1 edition of the PARSEME Shared Task (2018). For most languages, morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.1 (2018). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1
Rights:: PARSEME Shared Task Data (v. 1.1) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.1, and PUB

1. A Human-Annotated Dataset for Language Modeling and Named Entity Recognition in Medieval Documents

2. A Human-Annotated Dataset for Language Modeling and Named Entity Recognition in Medieval Documents (2023-01-05)

3. A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Documents

4. A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Documents: Supplementary Materials

5. Annotated corpora and tools of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (edition 1.1)

6. C4Corpus (CC BY-NC part)

7. C4Corpus (CC BY-NC-ND part)

8. C4Corpus (CC BY-NC-SA part)

9. C4Corpus (CC BY-ND part)

10. C4Corpus (CC BY-SA part)

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from