In the existential sentences of Slavonic languages we can find some interesting deviations from the basic type of Indo-European sentences, ie. "Nominative + concordant Verb", for instance Genitive of negation; in some, especially South Slavonic languages there are examples of the main nominal part of positive existential sentence (ie. name of the existing entity) in Genitive or even (as in Slovenian povsod jo je) in Accusative. These deviations can be of interest for the study of the development of Indo-European syntax, as Miklosich and Potebnya already in the 19th century observed. Relevant in this aspect also is the opposition between autosemantic (existential or possessive) esse and (zero or non zero) copula. This phenomena are here studied from the standpoint of the general opposition between polymorphic and monomorphic structures of the syntactical system.
CoNLL 2017 and 2018 shared tasks:
Multilingual Parsing from Raw Text to Universal Dependencies
This package contains the test data in the form in which they ware presented
to the participating systems: raw text files and files preprocessed by UDPipe.
The metadata.json files contain lists of files to process and to output;
README files in the respective folders describe the syntax of metadata.json.
For full training, development and gold standard test data, see
Universal Dependencies 2.0 (CoNLL 2017)
Universal Dependencies 2.2 (CoNLL 2018)
See the download links at http://universaldependencies.org/.
For more information on the shared tasks, see
http://universaldependencies.org/conll17/
http://universaldependencies.org/conll18/
Contents:
conll17-ud-test-2017-05-09 ... CoNLL 2017 test data
conll18-ud-test-2018-05-06 ... CoNLL 2018 test data
conll18-ud-test-2018-05-06-for-conll17 ... CoNLL 2018 test data with metadata
and filenames modified so that it is digestible by the 2017 systems.
The authors present their respective views on the development of the Czech post-war syntactic studies. Their approach is influenced by the fact that they were educated by the different syntactic schools: thus the paper is a combination of Prague’s and Brno´s views. V. Šmilauer´s Novočeská skladba (Syntax of Modern Czech, 1947) is understood as a source of the contemporary research of the Czech syntax. The paper describes the results reached by individual investigators as well as the results of the research teams. According to the authors´ opinion, Two-Level Valency Syntax (represented by F. Daneš and his close collaborators and reflected in the Czech Academic Grammar) and Functional Generative Grammar (developed by P. Sgall and his colleagues) form the main paradigms of the Czech syntax since 1960. Both theories incorporate the results of the classical Praguian functional approach as well as results of the generative paradigm. The authors conclude that the Prague‘s and Brno´s views on the development of Czech syntactic studies are not incompatible but rather complementary and that the methods of formal and corpus linguistics are attractive and useful for the young researchers.
ForFun is a database of linguistic forms and their syntactic functions built with the use of the multi-layer annotated corpora of Czech, the Prague Dependency Treebanks. The purpose of the Prague Database of Forms and Functions (ForFun) is to help the linguists to study the form-function relation, which we assume to be one of the principal tasks of both theoretical linguistics and natural language processing.
A prototypical question to be asked is "What purposes does a preposition 'po' serve for" or "What are the linguistic means in the sentence that can express the meaning 'a destination of an action'?". There are almost 1500 distinct forms (besides the 'po' preposition) and 65 distinct functions (besides the 'destination').
HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. This version uses Universal Dependencies as the common annotation style.
Update (November 1017): for a current collection of harmonized dependency treebanks, we recommend using the Universal Dependencies (UD). All of the corpora that are distributed in HamleDT in full are also part of the UD project; only some corpora from the Patch group (where HamleDT provides only the harmonizing scripts but not the full corpus data) are available in HamleDT but not in UD.
This paper presents and discusses the results of an experiment testing the validity of the Trace Deletion Hypothesis (Grodzinsky 1989, 2000) in Czech. The Trace Deletion Hypothesis (= TDH) was proposed to account for a receptive syntactic deficit in Broca’s aphasics that involves structures containing transformational operations such as the passive. According to the assumptions of the TDH, in passive constructions Broca’s aphasics fail to assign a semantic θ-role to the derived subject syntactically, so they assign it the Agent θ-role by linear consideration (Default Principle), which results in a structure with two potential Agents. This strategy is supposed to lead to the chance performance of Broca’s aphasics in these structures, as they are forced to guess the distribution of the Agent and the Patient θ-roles. The results of our experiment, however, do not support the TDH-proposal: out of the six tested subjects, only one performed at chance. The error rate for reversible passive structures in Czech was 33.34%, which corresponds to an above-chance performance. Given these results, the validity of the TDH is called into question, also with respect to the development of the generative theory itself., In diesem Artikel werden Ergebnisse eines Experiments präsentiert und diskutiert, in dem die Validität der sogenannten Spurentilgungshypothese (Trace Deletion Hypothesis – Grodzinsky, 1989, 2000) für das Tschechische überprüft wurde. Die Spurentilgungshypothese (= STH) wurde vorgeschlagen, um rezeptive syntaktische Defizite von Strukturen mit Transformationsoperationen (z. B. das Passiv) zu erklären, die bei Patienten mit Broca-Aphasie auftauchen. Beim Verständnis von Passivkonstruktionen misslingt den Broca-Aphasikern laut der STH die Zuordnung der semantischen θ-Rolle zum syntaktisch derivierten Subjekt. Stattdessen stützen sich die Broca-Aphasiker bei der Zuweisung der Agens θ-Rolle auf die lineare Abfolge der Satzglieder (Default Prinzip), was dazu führt, dass die Struktur aus Sicht der Aphasiker zwei potentielle Agens hat. Diese Strategie führt zu einer zufälligen Wahl, da Broca-Aphasiker die Verteilung zwischen den Agens und Patiens θ-Rollen raten müssen. Die Ergebnisse des hier vorgestellten Experiments unterstützen die Gültigkeit der STH-Hypothese nicht: Von sechs getesteten Probanden wies nur ein Proband eine zufällige Verteilung der semantischen Rollen auf. Die Fehlerrate für reversible Passivkonstruktionen im Tschechischen lag bei 33,34 % – dies entspricht einer überzufälligen Leistung der Probanden bezüglich der Zuordnung der semantischen θ-Rollen. Angesichts dieser Resultate muss die Validität der STH-Hypothese in Frage gestellt werden, und zwar auch im Hinblick auf die allgemeine Entwicklung der generativen Theorie., Andrea Hudousková, Eva Flanderková, Barbara Mertins, Kristýna Tomšů., and Obsahuje seznam literatury
This package contains data used in the IWPT 2020 shared task. It contains training, development and test (evaluation) datasets. The data is based on a subset of Universal Dependencies release 2.5 (http://hdl.handle.net/11234/1-3105) but some treebanks contain additional enhanced annotations. Moreover, not all of these additions became part of Universal Dependencies release 2.6 (http://hdl.handle.net/11234/1-3226), which makes the shared task data unique and worth a separate release to enable later comparison with new parsing algorithms. The package also contains a number of Perl and Python scripts that have been used to process the data during preparation and during the shared task. Finally, the package includes the official primary submission of each team participating in the shared task.
This package contains data used in the IWPT 2021 shared task. It contains training, development and test (evaluation) datasets. The data is based on a subset of Universal Dependencies release 2.7 (http://hdl.handle.net/11234/1-3424) but some treebanks contain additional enhanced annotations. Moreover, not all of these additions became part of Universal Dependencies release 2.8 (http://hdl.handle.net/11234/1-3687), which makes the shared task data unique and worth a separate release to enable later comparison with new parsing algorithms. The package also contains a number of Perl and Python scripts that have been used to process the data during preparation and during the shared task. Finally, the package includes the official primary submission of each team participating in the shared task.
Mapping table for the article Hajič et al., 2024: Mapping Czech Verbal Valency to PropBank Argument Labels, in LREC-COLING 2024, as preprocess by the algorithm described in the paper. This dataset i smeant for verification (replicatoin) purposes only. It will b manually processed further to arrive at a workable CzezchpropBank, to be used in Czech UMR annotation, to be further updated during the annotation. The resulting PropBank frame files fir Czech are expected to be available with some future releases of UMR, containing Czech UMR annotation, or separately.
NomVallex 2.0 is a manually annotated valency lexicon of Czech nouns and adjectives, created in the theoretical framework of the Functional Generative Description and based on corpus data (the SYN series of corpora from the Czech National Corpus and the Araneum Bohemicum Maximum corpus). In total, NomVallex is comprised of 1027 lexical units contained in 570 lexemes, covering the following parts-of-speech and derivational categories: deverbal or deadjectival nouns, and deverbal, denominal, deadjectival or primary adjectives. Valency properties of a lexical unit are captured in a valency frame (modeled as a sequence of valency slots, each supplemented with a list of morphemic forms) and documented by corpus examples. In order to make it possible to study the relationship between valency behavior of base words and their derivatives, lexical units of nouns and adjectives in NomVallex are linked to their respective base lexical units (contained either in NomVallex itself or, in case of verbs, in the VALLEX lexicon), linking up to three parts-of-speech (i.e., noun – verb, adjective – verb, noun – adjective, and noun – adjective – verb).
In order to facilitate comparison, this submission also contains abbreviated entries of the base verbs of these nouns and adjectives from the VALLEX lexicon and simplified entries of the covered nouns and adjectives from the PDT-Vallex lexicon.