The authors present their respective views on the development of the Czech post-war syntactic studies. Their approach is influenced by the fact that they were educated by the different syntactic schools: thus the paper is a combination of Prague’s and Brno´s views. V. Šmilauer´s Novočeská skladba (Syntax of Modern Czech, 1947) is understood as a source of the contemporary research of the Czech syntax. The paper describes the results reached by individual investigators as well as the results of the research teams. According to the authors´ opinion, Two-Level Valency Syntax (represented by F. Daneš and his close collaborators and reflected in the Czech Academic Grammar) and Functional Generative Grammar (developed by P. Sgall and his colleagues) form the main paradigms of the Czech syntax since 1960. Both theories incorporate the results of the classical Praguian functional approach as well as results of the generative paradigm. The authors conclude that the Prague‘s and Brno´s views on the development of Czech syntactic studies are not incompatible but rather complementary and that the methods of formal and corpus linguistics are attractive and useful for the young researchers.
The necessity to distinguish between ontological (cognitive, extralinguistic) content and linguistic (''literal'') meaning has its sources in European structural linguistics. The idea that the task of linguistics itself is to study language in its ''form'' rather than in its ''substance'' is further elaborated in the Prague Linguistic Circle. However, analyzing concrete language data we often face many open questions: It is not always clear how to divide the knowledge of language from the knowledge of the world, which general criteria could be used for the separation of (language) ambiguity and vagueness etc. The present contribution cannot be aimed at the solution of these non-trivial distinctions; we only present some Czech examples as a challenge for consideration, which we believe to be useful for the determination of this boundary. The examples belonging to the different phenomena of language structure are analyzed from the point of view of the asymmetry between the layer of content and the layer of meaning. The examples with different aspectual and tense forms are used as an exemplification of the asymmetry ''same content - different meanings''. The reflexive forms, dative case dependent on the verb, coreference with infinitival and other constructions serve as examples of the situation where instances of different content are not articulated as oppositions in linguistic meaning but rather display structural ambiguity. Despite of these problems, we are convinced that without keeping the distinction between linguistic meaning and cognitive content during the analysis of language data the description of the language system is impossible.
In the present paper two pairs of terms and notions are discussed as for their benefit to syntactic studies. The notions of coordination and subordination with their counterparts the parataxis and hypotaxis are studied in relation to the domain of the linguistic meaning and to the domain of language form, respectively. The asymmetry between them is studied on selected data of Czech. Czech constructions classified in Czech syntactic handbooks as hypotactical forms of coordination are analyzed. In the syntactic structure of Otec s matkou odjeli do lázní [Father with mother went to the spa] the possible plural agreement of the predicate demonstrates a hypotactical patterning of the coordination between father and mother. The “false” subordinated clauses are presented as the other example of the hypotactic coordination. On the other side, the nominal constructions introduced by the expressions místo [instead of], and the ambiguous expression kromě [beside/with exception] are excluded from the domain of asymmetry and the proposal to classify them as specific types of adverbials (a substitution, an addition, and an exception) is formulated.
LiFR-Law is a corpus of Czech legal and administrative texts with measured reading comprehension and a subjective expert annotation of diverse textual properties based on the Hamburg Comprehensibility Concept (Langer, Schulz von Thun, Tausch, 1974). It has been built as a pilot data set to explore the Linguistic Factors of Readability (hence the LiFR acronym) in Czech administrative and legal texts, modeling their correlation with actually observed reading comprehension. The corpus is comprised of 18 documents in total; that is, six different texts from the legal/administration domain, each in three versions: the original and two paraphrases. Each such document triple shares one reading-comprehension test administered to at least thirty readers of random gender, educational background, and age. The data set also captures basic demographic information about each reader, their familiarity with the topic, and their subjective assessment of the stylistic properties of the given document, roughly corresponding to the key text properties identified by the Hamburg Comprehensibility Concept.
LiFR-Law is a corpus of Czech legal and administrative texts with measured reading comprehension and a subjective expert annotation of diverse textual properties based on the Hamburg Comprehensibility Concept (Langer, Schulz von Thun, Tausch, 1974). It has been built as a pilot data set to explore the Linguistic Factors of Readability (hence the LiFR acronym) in Czech administrative and legal texts, modeling their correlation with actually observed reading comprehension. The corpus is comprised of 18 documents in total; that is, six different texts from the legal/administration domain, each in three versions: the original and two paraphrases. Each such document triple shares one reading-comprehension test administered to at least thirty readers of random gender, educational background, and age. The data set also captures basic demographic information about each reader, their familiarity with the topic, and their subjective assessment of the stylistic properties of the given document, roughly corresponding to the key text properties identified by the Hamburg Comprehensibility Concept.
Changes to the previous version and helpful comments
• File names of the comprehension test results (self-explanatory)
• Corrected one erroneous automatic evaluation rule in the multiple-choice evaluation (zahradnici_3,
TRUE and FALSE had been swapped)
• Evaluation protocols for both question types added into Folder lifr_formr_study_design
• Data has been cleaned: empty responses to multiple-choice questions were re-inserted. Now, all surveys
are considered complete that have reader’s subjective text evaluation complete (these were placed at
the very end of each survey).
• Only complete surveys (all 7 content questions answered) are represented. We dropped the replies of
six users who did not complete their surveys.
• A few missing responses to open questions have been detected and re-inserted.
• The demographic data contain all respondents who filled in the informed consent and the demographic
details, with respondents who did not complete any test survey (but provided their demographic
details) in a separate file. All other data have been cleaned to contain only responses by the regular
respondents (at least one completed survey).
Corpus of Czech educational texts for readability studies, with paraphrases, measured reading comprehension, and a multi-annotator subjective rating of selected text features based on the Hamburg Comprehensibility Concept
Corpus of Czech educational texts for readability studies, with paraphrases, measured reading comprehension, and a multi-annotator subjective rating of selected text features based on the Hamburg Comprehensibility Concept
The valency lexicon PDT-Vallex 4.0 has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT, the spoken language corpus (PDTSC) and corpus of user-generated texts in the project Faust). It contains over 14500 valency frames for almost 8500 verbs which occurred in the PDT, PCEDT, PDTSC and Faust corpora. In addition, there are nouns, adjectives and adverbs, linked from the PDT part only, increasing the total to over 17000 valency frames for 13000 words. All the corpora have been published in 2020 as the PDT-C 1.0 corpus with the PDT-Vallex 4.0 dictionary included; this is a copy of the dictionary published as a separate item for those not interested in the corpora themselves. It is available in electronically processable format (XML), and also in more human readable form including corpus examples (see the WEBSITE link below, and the links to its main publications elsewhere in this metadata). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives. It replaces the previously published unversioned edition of PDT-Vallex from 2014.