Subject: lexicon - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Subject lexicon

1. CALEM (Comprehensive Arabic LEMmas)

Creator:: Namly, Driss, Bouzoubaa, Karim, and El Jihad, Abdelhamid
Publisher:: ALELM
Type:: text, lexicon, and lexicalConceptualResource
Subject:: lexicon, lemmatization, and stemming;
Language:: Arabic
Description:: Comprehensive Arabic LEMmas is a lexicon covering a large list of Arabic lemmas and their corresponding inflected word forms (stems) with details (POS + Root). Each lexical entry represents a lemma followed by all its possible stems and each stem is enriched by its morphological features especially the root and the POS. It is composed of 164,845 lemmas representing 7,200,918 stems, detailed as follow: 757 Arabic particles 2,464,631 verbal stems 4,735,587 nominal stems The lexicon is provided as an LMF conformant XML-based file in UTF8 encoding, which represents about 1,22 Gb of data. Citation: – Namly Driss, Karim Bouzoubaa, Abdelhamid El Jihad, and Si Lhoussain Aouragh. “Improving Arabic Lemmatization Through a Lemmas Database and a Machine-Learning Technique.” In Recent Advances in NLP: The Case of Arabic Language, pp. 81-100. Springer, Cham, 2020.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

2. Czech Verbal MWEs

Creator:: Bejček, Eduard
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: lexicon, verbs, multiword expressions, forms, and lemmatization
Language:: Czech
Description:: Lexicon of Czech verbal multiword expressions (VMWEs) used in Parseme Shared Task 2017. https://typo.uni-konstanz.de/parseme/index.php/2-general/142-parseme-shared-task-on-automatic-detection-of-verbal-mwes Lexicon consists of 4785 VMWEs, categorized into four categories according to Parseme Shared Task (PST) typology: IReflV (inherently reflexive verbs), LVC (light verb constructions), ID (idiomatic expressions) and OTH (other VMWEs with other than verbal syntactic head). Verbal multiword expressions as well as deverbative variants of VMWEs were annotated during the preparation phase of PST. These data were published as http://hdl.handle.net/11372/LRT-2282. Czech part includes 14,536 VMWE occurences: 1611 ID 10000 IReflV 2923 LVC 2 OTH This lexicon was created out of Czech data. Each lexicon entry is represented by one line in the form: type lemmas frequency PoS [used form 1; used form 2; ... ] (columns are separated by tabs) where: type ... is the type of VMWE in PST typology lemmas ... are space separated lemmatized forms of all words that constitutes the VMWE frequency ... is the absolute frequency of this item in PST data PoS ... is a space separated list of parts of speech of individual words (in the same order as in "lemmas") final field contains a list of all (1 to 18) used forms found in the data (since Czech is a flective language).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

3. CzeDLex 0.5

Creator:: Mírovský, Jiří, Synková, Pavlína, Rysová, Magdaléna, and Poláková, Lucie
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: lexicon and discourse annotation
Language:: Czech
Description:: CzeDLex 0.5 is a pilot version of a lexicon of Czech discourse connectives. The lexicon contains connectives partially automatically extracted from the Prague Discourse Treebank 2.0 (PDiT 2.0), a large corpus annotated manually with discourse relations. The most frequent entries in the lexicon (covering more than 2/3 of the discourse relations annotated in the PDiT 2.0) have been manually checked, translated to English and supplemented with additional linguistic information.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

4. CzeDLex 0.6

Creator:: Synková, Pavlína, Poláková, Lucie, Mírovský, Jiří, and Rysová, Magdaléna
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: lexicon and discourse annotation
Language:: Czech
Description:: CzeDLex 0.6 is the second development version of the lexicon of Czech discourse connectives. The lexicon contains connectives partially automatically extracted from the Prague Discourse Treebank 2.0 (PDiT 2.0), a large corpus annotated manually with discourse relations. The most frequent entries in the lexicon (76 out of total 204 entries, covering more than 90% of the discourse relations annotated in PDiT 2.0), have been manually checked, translated to English and supplemented with additional linguistic information.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

5. CzeDLex 0.7

Creator:: Poláková, Lucie, Mírovský, Jiří, Synková, Pavlína, Kloudová, Věra, and Rysová, Magdaléna
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: lexicon and discourse annotation
Language:: Czech
Description:: CzeDLex 0.7 is the third development version of the Lexicon of Czech discourse connectives. The lexicon contains connectives partially automatically extracted from the Prague Discourse Treebank 2.0 (PDiT 2.0) and, as a supplementary resource, the Czech part of the Prague Czech–English Dependency Treebank with discourse annotation projected from the Penn Discourse Treebank 3.0. The most frequent entries in the lexicon (131 out of total 218 entries, covering more than 95% of discourse relations annotated in PDiT 2.0), have been manually checked, translated to English and supplemented with additional linguistic information.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

6. CzeDLex 1.0

Creator:: Mírovský, Jiří, Synková, Pavlína, Poláková, Lucie, Kloudová, Věra, and Rysová, Magdaléna
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: lexicon and discourse
Language:: Czech
Description:: CzeDLex 1.0 is the first production version (the fourth development version) of the Lexicon of Czech discourse connectives. The lexicon contains connectives partially automatically extracted from resources annotated manually with discourse relations: the Prague Discourse Treebank 2.0 (PDiT 2.0) as the primary resource, and two supplementary resources: (i) the Czech part of the Prague Czech–English Dependency Treebank with discourse annotation projected from the Penn Discourse Treebank 3.0, and (ii) a thousand sentences selected from various fiction novels and transcriptions of public speeches. All 200 entries in the lexicon have been manually checked, translated to English and supplemented with additional linguistic information.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

7. CzEngVallex

Creator:: Urešová, Zdeňka, Fučíková, Eva, Hajič, Jan, and Šindlerová, Jana
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: verbal valency, argument structure, valency frame, lexicon, corpus annotation, translation equivalent, comparative syntax, comparative semantics, and valency annotation
Language:: English
Description:: CzEngVallex is a bilingual valency lexicon of corresponding Czech and English verbs. It connects 20835 aligned valency frame pairs (verb senses) which are translations of each other, aligning their arguments as well. The CzEngVallex serves as a powerful, real-text-based database of frame-to-frame and subsequently argument-to-argument pairs and can be used for example for machine translation applications. It uses the data from the Prague Czech-English Dependency Treebank project (PCEDT 2.0, http://hdl.handle.net/11858/00-097C-0000-0015-8DAF-4) and it also takes advantage of two existing valency lexicons: PDT-Vallex for Czech and EngVallex for English, using the same view of valency (based on the Functional Generative Description theory). The CzEngVallex is available in an XML format in the LINDAT/CLARIN repository, and also in a searchable form (see the “More Apps” tab) interlinked with PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F),EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2) and with examples from the PCEDT.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

8. Dva pohledy na vývoj českého poválečného syntaktického myšlení

Creator:: Karlík, Petr and Panevová, Jarmila
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: syntax, linguistic theory, structuralism, generative grammar, two-level valency syntax, functional generative description, grammar, lexicon, teorie jazyka, strukturalismus, generativní gramatika, dvourovinná valenční syntax, funkční generativní popis, gramatika, and slovník
Language:: Czech
Description:: The authors present their respective views on the development of the Czech post-war syntactic studies. Their approach is influenced by the fact that they were educated by the different syntactic schools: thus the paper is a combination of Prague’s and Brno´s views. V. Šmilauer´s Novočeská skladba (Syntax of Modern Czech, 1947) is understood as a source of the contemporary research of the Czech syntax. The paper describes the results reached by individual investigators as well as the results of the research teams. According to the authors´ opinion, Two-Level Valency Syntax (represented by F. Daneš and his close collaborators and reflected in the Czech Academic Grammar) and Functional Generative Grammar (developed by P. Sgall and his colleagues) form the main paradigms of the Czech syntax since 1960. Both theories incorporate the results of the classical Praguian functional approach as well as results of the generative paradigm. The authors conclude that the Prague‘s and Brno´s views on the development of Czech syntactic studies are not incompatible but rather complementary and that the methods of formal and corpus linguistics are attractive and useful for the young researchers.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

9. EngVallex - English Valency Lexicon 2.0

Creator:: Cinková, Silvie, Fučíková, Eva, Šindlerová, Jana, and Hajič, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, computationalLexicon, and lexicalConceptualResource
Subject:: Annotations, corpus, linguistic data, lexicon, lexical semantics, Monolingual, semantics, verbal valency, and valency
Language:: English
Description:: EngVallex 2.0 as a slightly updated version of EngVallex. It is the English counterpart of the PDT-Vallex valency lexicon, using the same view of valency, valency frames and the description of a surface form of verbal arguments. EngVallex contains links also to PropBank (English predicate-argument lexicon). The EngVallex lexicon is fully linked to the English side of the PCEDT parallel treebank(s), which is in fact the PTB re-annotated using the Prague Dependency Treebank style of annotation. The EngVallex is available in an XML format in our repository, and also in a searchable form with examples from the PCEDT. EngVallex 2.0 is the same dataset as the EngVallex lexicon packaged with the PCEDT 3.0 corpus, but published separately under a more permissive licence, avoiding the need for LDC licence which is tied to PCEDT 3.0 as a whole.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

10. Lexicon of Czech and German Anaphoric Connectives

Creator:: Rysová, Kateřina, Poláková, Lucie, Rysová, Magdaléna, and Mírovský, Jiří
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: lexicon, discourse, and bilingual
Language:: Czech and German
Description:: GeCzLex 1.0 is an online electronic resource for translation equivalents of Czech and German discourse connectives. It contains anaphoric connectives for both languages and their possible translations documented in bilingual parallel corpora (not necessarily anaphoric). The entries have been interlinked via semantic annotation of the connectives (taken from monolingual lexicons of connectives CzeDLex and DiMLex) according to the PDTB 3 sense taxonomy and translation possibilities aquired from the Czech and German parallel data of the Intercorp project. The lexicon is the first bilingual inventory of connectives with linkage on the level of individual pairs (connective + discourse sense).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

1. CALEM (Comprehensive Arabic LEMmas)

2. Czech Verbal MWEs

3. CzeDLex 0.5

4. CzeDLex 0.6

5. CzeDLex 0.7

6. CzeDLex 1.0

7. CzEngVallex

8. Dva pohledy na vývoj českého poválečného syntaktického myšlení

9. EngVallex - English Valency Lexicon 2.0

10. Lexicon of Czech and German Anaphoric Connectives

Limit your search

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Creator

Show values starting with

Format

Language

Publisher

Rights

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from