ForFun is a database of linguistic forms and their syntactic functions built with the use of the multi-layer annotated corpora of Czech, the Prague Dependency Treebanks. The purpose of the Prague Database of Forms and Functions (ForFun) is to help the linguists to study the form-function relation, which we assume to be one of the principal tasks of both theoretical linguistics and natural language processing.
A prototypical question to be asked is "What purposes does a preposition 'po' serve for" or "What are the linguistic means in the sentence that can express the meaning 'a destination of an action'?". There are almost 1500 distinct forms (besides the 'po' preposition) and 65 distinct functions (besides the 'destination').
This paper presents and discusses the results of an experiment testing the validity of the Trace Deletion Hypothesis (Grodzinsky 1989, 2000) in Czech. The Trace Deletion Hypothesis (= TDH) was proposed to account for a receptive syntactic deficit in Broca’s aphasics that involves structures containing transformational operations such as the passive. According to the assumptions of the TDH, in passive constructions Broca’s aphasics fail to assign a semantic θ-role to the derived subject syntactically, so they assign it the Agent θ-role by linear consideration (Default Principle), which results in a structure with two potential Agents. This strategy is supposed to lead to the chance performance of Broca’s aphasics in these structures, as they are forced to guess the distribution of the Agent and the Patient θ-roles. The results of our experiment, however, do not support the TDH-proposal: out of the six tested subjects, only one performed at chance. The error rate for reversible passive structures in Czech was 33.34%, which corresponds to an above-chance performance. Given these results, the validity of the TDH is called into question, also with respect to the development of the generative theory itself., In diesem Artikel werden Ergebnisse eines Experiments präsentiert und diskutiert, in dem die Validität der sogenannten Spurentilgungshypothese (Trace Deletion Hypothesis – Grodzinsky, 1989, 2000) für das Tschechische überprüft wurde. Die Spurentilgungshypothese (= STH) wurde vorgeschlagen, um rezeptive syntaktische Defizite von Strukturen mit Transformationsoperationen (z. B. das Passiv) zu erklären, die bei Patienten mit Broca-Aphasie auftauchen. Beim Verständnis von Passivkonstruktionen misslingt den Broca-Aphasikern laut der STH die Zuordnung der semantischen θ-Rolle zum syntaktisch derivierten Subjekt. Stattdessen stützen sich die Broca-Aphasiker bei der Zuweisung der Agens θ-Rolle auf die lineare Abfolge der Satzglieder (Default Prinzip), was dazu führt, dass die Struktur aus Sicht der Aphasiker zwei potentielle Agens hat. Diese Strategie führt zu einer zufälligen Wahl, da Broca-Aphasiker die Verteilung zwischen den Agens und Patiens θ-Rollen raten müssen. Die Ergebnisse des hier vorgestellten Experiments unterstützen die Gültigkeit der STH-Hypothese nicht: Von sechs getesteten Probanden wies nur ein Proband eine zufällige Verteilung der semantischen Rollen auf. Die Fehlerrate für reversible Passivkonstruktionen im Tschechischen lag bei 33,34 % – dies entspricht einer überzufälligen Leistung der Probanden bezüglich der Zuordnung der semantischen θ-Rollen. Angesichts dieser Resultate muss die Validität der STH-Hypothese in Frage gestellt werden, und zwar auch im Hinblick auf die allgemeine Entwicklung der generativen Theorie., Andrea Hudousková, Eva Flanderková, Barbara Mertins, Kristýna Tomšů., and Obsahuje seznam literatury
NomVallex 2.0 is a manually annotated valency lexicon of Czech nouns and adjectives, created in the theoretical framework of the Functional Generative Description and based on corpus data (the SYN series of corpora from the Czech National Corpus and the Araneum Bohemicum Maximum corpus). In total, NomVallex is comprised of 1027 lexical units contained in 570 lexemes, covering the following parts-of-speech and derivational categories: deverbal or deadjectival nouns, and deverbal, denominal, deadjectival or primary adjectives. Valency properties of a lexical unit are captured in a valency frame (modeled as a sequence of valency slots, each supplemented with a list of morphemic forms) and documented by corpus examples. In order to make it possible to study the relationship between valency behavior of base words and their derivatives, lexical units of nouns and adjectives in NomVallex are linked to their respective base lexical units (contained either in NomVallex itself or, in case of verbs, in the VALLEX lexicon), linking up to three parts-of-speech (i.e., noun – verb, adjective – verb, noun – adjective, and noun – adjective – verb).
In order to facilitate comparison, this submission also contains abbreviated entries of the base verbs of these nouns and adjectives from the VALLEX lexicon and simplified entries of the covered nouns and adjectives from the PDT-Vallex lexicon.
The NomVallex I. lexicon describes valency of Czech deverbal nouns belonging to three semantic classes, i.e. Communication (dotaz 'question'), Mental Action (plán 'plan') and Psych State (nenávist 'hatred'). It covers both stem-nominals and root-nominals (dotazování se 'asking' and dotaz 'question'). In total, the lexicon includes 505 lexical units in 248 lexemes. Valency properties are captured in the form of valency frames, specifying valency slots and their morphemic forms, and are exemplified by corpus examples.
In order to facilitate comparison, this submission also contains abbreviated entries of the source verbs of these nouns from the Vallex lexicon and simplified entries of the covered nouns from the PDT-Vallex lexicon.
The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied Linguistics under various projects between 1996 and 2018 on the original texts, i.e., all annotation from PDT 1.0, PDT 2.0, PDT 2.5, PDT 3.0, PDiT 1.0 and PDiT 2.0, plus corrections, new structure of basic documentation and new list of authors covering all previous editions. The Prague Dependency Treebank 3.5 (PDT 3.5) contains the same texts as the previous versions since 2.0; there are 49,431 annotated sentences (832,823 words) on all layers, from tectogrammatical annotation to syntax to morphology. There are additional annotated sentences for syntax and morphology; the totals for the lower layers of annotation are: 87,913 sentences with 1,502,976 words at the analytical layer (surface dependency syntax) and 115,844 sentences with 1,956,693 words at the morphological layer of annotation (these totals include the annotation with the higher layers annotated as well). Closely linked to the tectogrammatical layer is the annotation of sentence information structure, multiword expressions, coreference, bridging relations and discourse relations.
VALLEX 3.0 provides information on the valency structure (combinatorial potential) of verbs in their particular senses, which are characterized by glosses and examples. VALLEX 3.0 describes almost 4 600 Czech verbs in more than 10 800 lexical units, i.e., given verbs in the given senses.
VALLEX 3.0 is a is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description of valency frames of Czech verbs. In order to satisfy different needs of different potential users, the lexicon is distributed (i) in a HTML version (the data allows for an easy and fast navigation through the lexicon) and (ii) in a machine-tractable form as a single XML file, so that the VALLEX data can be used in NLP applications.