Automatická slovnědruhová desambiguace slova to v ustálených větných výrazech
- Title:
- Automatická slovnědruhová desambiguace slova to v ustálených větných výrazech
The automatic part-of-speech disambiguation of the word to in fixed collocations - Creator:
- Hnátková, Milena
- Identifier:
- https://cdk.lib.cas.cz/client/handle/uuid:c2aba142-6630-45fe-af1c-3a9559ecec2b
uuid:c2aba142-6630-45fe-af1c-3a9559ecec2b - Subject:
- corpus, automatic morphological disambiguation, automatic identification of collocations, sentential phrases, word form to, korpus, automatická morfologická analýza, vyhledávání ustálených slovních spojení, větné frazémy, and slovní tvar toto
- Type:
- model:article and TEXT
- Format:
- bez média and svazek
- Description:
- This paper deals with an automatic part-of-speech disambiguation of Czech texts containing the word to (E. it) in fixed collocations used especially in spoken Czech, and, moreover, with case identification of the pronominal reading of this word. The word to is ambiguous: the result of automatic morphological analysis of this word is either the pronominal lemma ten (it) as a nominative/accusative singular neuter, or the particle lemma to. It is very difficult to automatically distinguish the nonprepositional nominative and accusative case in Czech texts. Therefore, the paper primarily focuses on to as a particle. The software module performing automatic identification of collocations in Czech corpus texts is part of the automatic morphological rule-based disambiguation used for tagging texts of synchronic Czech in the corpora of the SYN series: it deals mainly with the disam-biguation of nongrammatical collocations and phrases. The paper focuses on fixed ex-pressions listed in the Dictionary of Czech Phraseology and Idiomatics and is based on the description of automatic identification and classification of collocations comprising the word to in the SYN2010 corpus. Also, examples (primarily idioms) are presented where automatic disambiguation using general grammatical rules yields unreliable results.
- Language:
- Czech
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/
policy:public - Coverage:
- 22-35
- Source:
- Korpus - gramatika - axiologie: časopis pro korpusový výzkum a hodnocení jazyka | 2013 | Number:7
- Harvested from:
- CDK
- Metadata only:
- false
The item or associated files might be "in copyright"; review the provided rights metadata:
- http://creativecommons.org/publicdomain/mark/1.0/
- policy:public