dc.contributor.author | Sourada, Tomáš |
dc.date.accessioned | 2024-04-22T09:05:06Z |
dc.date.available | 2024-04-22T09:05:06Z |
dc.date.issued | 2024 |
dc.identifier.uri | http://hdl.handle.net/11234/1-5471 |
dc.description | Czech OOV Inflection Dataset is a Czech inflection dataset of nouns, focused on evaluation in out-of-vocabulary (OOV) conditions. It consists of two parts: a standard lemma-disjoint train-dev-test split of a subset of noun paradigms of existing morphological dictionary Czech MorfFlex 2.0 (files train, dev and test-MorfFlex); and small set of neologisms from Čeština 2.0, annotated for inflected forms (file test-neologisms). |
dc.language.iso | ces |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.relation.isreferencedby | https://doi.org/10.48550/arXiv.2404.08974 |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.source.uri | https://github.com/tomsouri/cz-inflect |
dc.subject | morphological generation |
dc.subject | morphology |
dc.subject | neologisms database |
dc.subject | Czech |
dc.title | Czech OOV Inflection Dataset |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.mediaType | text |
metashare.ResourceInfo#ContentInfo.detailedType | computationalLexicon |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Tomáš Sourada sourada@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
size.info | 6270880 entries |
files.size | 17906612 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
- Name
- CzechOOVInflectionDataset.tar.xz
- Size
- 17.08 MB
- Format
- application/x-xz
- Description
- The Czech OOV Inflection Dataset. Consists of a strictly lemma-disjoint train-dev-test split of MorfFlex nouns, and of a small test set consisting of inflected noun neologisms from Čeština 2.0.
- MD5
- f768e0166d0e81535e8afb2555d3eca3
- CzechOOVInflectionDataset
- test-neologisms.tsv34 kB
- LICENSE20 kB
- dev.tsv17 MB
- README3 kB
- test-MorfFlex.tsv17 MB
- train.tsv142 MB