An XML-based file containing the electronic version of al wassit dictionary. An Arabic monolingual dictionary accomplished by the Academy of the Arabic Language in Cairo
An LMF conformant XML-based file containing the electronic version of al wassit dictionary. An Arabic monolingual dictionary accomplished by the Academy of the Arabic Language in Cairo
A XML-based file containing all Arabic characters (letters, vowels and punctuations). Each character described with a description, different displays (isolated, at the beginning, middle and the end of a word), a codification (Unicode, others could be added later), and two transliterations (Buckwalter and wiki)
Description: This xml file is a lexicon containing all 21952 (28x28x28) Arabic triliteral combinations (roots). the file is split into three parts as follow: the first part contains the phonetic constraints that must be taken into account in the formation of Arabic roots (for more details see all_phonetic_rules.xml in http://arabic.emi.ac.ma/alelm/?q=Resources). the second part contains the lexicons that were used to create this lexicon (see in lexicons tag). the third part contains the roots.
ISLRN: 813-907-570-946-2
Description : This is an online edition of An Anglo-Saxon Dictionary, or a dictionary of "Old English". The dictionary records the state of the English language as it was used between ca. 700-1100 AD by the Anglo-Saxon inhabitants of the British Isles.
This project is based on a digital edition of An Anglo-Saxon dictionary, based on the manuscript collections of the late Joseph Bosworth (the so called Main Volume, first edition 1898) and its Supplement (first edition 1921), edited by Joseph Bosworth and T. Northcote Toller, today the largest complete dictionary of Old English (one day to be hopefully supplanted by the DOE). Alistair Campbell's "enlarged addenda and corrigenda" from 1972 are not public domain and are therefore not part of the online dictionary. Please see the front & back matter of the paper dictionary for further information, prefaces and lists of references & contractions.
The digitization project was initiated by Sean Crist in 2001 as a part of his Germanic Lexicon Project and many individuals and institutions have contributed to this project. Check out the original GLP webpage and the old Bosworth-Toller offline application webpage (to be updated). Currently the project is hosted by the Faculty of Arts, Charles University.
In 2010, the data from the GLP were converted to create the current site. Care was taken to preserve the typography of the original dictionary, but also provide a modern, user friendly interface for contemporary users.
In 2013, the entries were structurally re-tagged and the original typography was abandoned, though the immediate access to the scans of the paper dictionary was preserved.
Our aim is to reach beyond a simple digital edition and create an online environment dedicated to all interested in Old English and Anglo-Saxon culture. Feel free to join in the editing of the Dictionary, commenting on its numerous entries or participating in the discussions at our forums.
We hope that by drawing the attention of the community of Anglo-Saxonists to our site and joining our resources, we may create a more useful tool for everybody. The most immediate project to draw on the corrected and tagged data of the Dictionary is a Morphological Analyzer of Old English (currently under development).
We are grateful for the generous support of the Charles University Grant Agency and for the free hosting at the Faculty of Arts at Charles University. The site is currently maintained and developed by Ondrej Tichy et al. at the Department of English Language and ELT Methodology, Faculty of Arts, Charles University in Prague (Czech Republic).
Comprehensive Arabic LEMmas is a lexicon covering a large list of Arabic lemmas and their corresponding inflected word forms (stems) with details (POS + Root). Each lexical entry represents a lemma followed by all its possible stems and each stem is enriched by its morphological features especially the root and the POS.
It is composed of 164,845 lemmas representing 7,200,918 stems, detailed as follow:
757 Arabic particles
2,464,631 verbal stems
4,735,587 nominal stems
The lexicon is provided as an LMF conformant XML-based file in UTF8 encoding, which represents about 1,22 Gb of data.
Citation:
– Namly Driss, Karim Bouzoubaa, Abdelhamid El Jihad, and Si Lhoussain Aouragh. “Improving Arabic Lemmatization Through a Lemmas Database and a Machine-Learning Technique.” In Recent Advances in NLP: The Case of Arabic Language, pp. 81-100. Springer, Cham, 2020.
An XML-based file containing the electronic version of al logha al arabia al moassira (Contemporary Arabic) dictionary. An Arabic monolingual dictionary accomplished by Ahmed Mukhtar Abdul Hamid Omar (deceased: 1424) with the help of a working group
Lexicon of Czech verbal multiword expressions (VMWEs) used in Parseme Shared Task 2017. https://typo.uni-konstanz.de/parseme/index.php/2-general/142-parseme-shared-task-on-automatic-detection-of-verbal-mwes
Lexicon consists of 4785 VMWEs, categorized into four categories according to Parseme Shared Task (PST) typology: IReflV (inherently reflexive verbs), LVC (light verb constructions), ID (idiomatic expressions) and OTH (other VMWEs with other than verbal syntactic head).
Verbal multiword expressions as well as deverbative variants of VMWEs were annotated during the preparation phase of PST. These data were published as http://hdl.handle.net/11372/LRT-2282. Czech part includes 14,536 VMWE occurences:
1611 ID
10000 IReflV
2923 LVC
2 OTH
This lexicon was created out of Czech data. Each lexicon entry is represented by one line in the form:
type lemmas frequency PoS [used form 1; used form 2; ... ]
(columns are separated by tabs) where:
type ... is the type of VMWE in PST typology
lemmas ... are space separated lemmatized forms of all words that constitutes the VMWE
frequency ... is the absolute frequency of this item in PST data
PoS ... is a space separated list of parts of speech of individual words (in the same order as in "lemmas")
final field contains a list of all (1 to 18) used forms found in the data (since Czech is a flective language).
CzeDLex 0.5 is a pilot version of a lexicon of Czech discourse connectives. The lexicon contains connectives partially automatically extracted from the Prague Discourse Treebank 2.0 (PDiT 2.0), a large corpus annotated manually with discourse relations. The most frequent entries in the lexicon (covering more than 2/3 of the discourse relations annotated in the PDiT 2.0) have been manually checked, translated to English and supplemented with additional linguistic information.
CzeDLex 0.6 is the second development version of the lexicon of Czech discourse connectives. The lexicon contains connectives partially automatically extracted from the Prague Discourse Treebank 2.0 (PDiT 2.0), a large corpus annotated manually with discourse relations. The most frequent entries in the lexicon (76 out of total 204 entries, covering more than 90% of the discourse relations annotated in PDiT 2.0), have been manually checked, translated to English and supplemented with additional linguistic information.
CzeDLex 0.7 is the third development version of the Lexicon of Czech discourse connectives. The lexicon contains connectives partially automatically extracted from the Prague Discourse Treebank 2.0 (PDiT 2.0) and, as a supplementary resource, the Czech part of the Prague Czech–English Dependency Treebank with discourse annotation projected from the Penn Discourse Treebank 3.0. The most frequent entries in the lexicon (131 out of total 218 entries, covering more than 95% of discourse relations annotated in PDiT 2.0), have been manually checked, translated to English and supplemented with additional linguistic information.
CzeDLex 1.0 is the first production version (the fourth development version) of the Lexicon of Czech discourse connectives. The lexicon contains connectives partially automatically extracted from resources annotated manually with discourse relations: the Prague Discourse Treebank 2.0 (PDiT 2.0) as the primary resource, and two supplementary resources: (i) the Czech part of the Prague Czech–English Dependency Treebank with discourse annotation projected from the Penn Discourse Treebank 3.0, and (ii) a thousand sentences selected from various fiction novels and transcriptions of public speeches. All 200 entries in the lexicon have been manually checked, translated to English and supplemented with additional linguistic information.
The CzEngClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language resources, i.e., relating verb meanings with respect to contextually-based verb synonymy. The lexicon entries are linked to PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F), EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2), CzEngVallex (http://hdl.handle.net/11234/1-1512), FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), VerbNet (http://verbs.colorado.edu/verbnet/index.html), PropBank (http://verbs.colorado.edu/%7Empalmer/projects/ace.html), Ontonotes (http://verbs.colorado.edu/html_groupings/), and Czech (http://hdl.handle.net/11858/00-097C-0000-0001-4880-3) and English Wordnets (https://wordnet.princeton.edu/). Part of the dataset is a file reflecting annotators choices for assignment of verbs to classes.
The CzEngClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language resources, i.e., relating verb meanings with respect to contextually-based verb synonymy. The lexicon entries are linked to PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F), EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2), CzEngVallex (http://hdl.handle.net/11234/1-1512), FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), VerbNet (http://verbs.colorado.edu/verbnet/index.html), PropBank (http://verbs.colorado.edu/%7Empalmer/projects/ace.html), Ontonotes (http://verbs.colorado.edu/html_groupings/), and Czech (http://hdl.handle.net/11858/00-097C-0000-0001-4880-3) and English Wordnets (https://wordnet.princeton.edu/). Part of the dataset are files reflecting annotators choices and agreement for assignment of verbs to classes.
The CzEngClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language resources, i.e., relating verb meanings with respect to contextually-based verb synonymy. The lexicon entries are linked to PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F), EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2), CzEngVallex (http://hdl.handle.net/11234/1-1512), FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), VerbNet (http://verbs.colorado.edu/verbnet/index.html), PropBank (http://verbs.colorado.edu/%7Empalmer/projects/ace.html), Ontonotes (http://verbs.colorado.edu/html_groupings/), and Czech (http://hdl.handle.net/11858/00-097C-0000-0001-4880-3) and English Wordnets (https://wordnet.princeton.edu/).
CzEngVallex is a bilingual valency lexicon of corresponding Czech and English verbs. It connects 20835 aligned valency frame pairs (verb senses) which are translations of each other, aligning their arguments as well. The CzEngVallex serves as a powerful, real-text-based database of frame-to-frame and subsequently argument-to-argument pairs and can be used for example for machine translation applications. It uses the data from the Prague Czech-English Dependency Treebank project (PCEDT 2.0, http://hdl.handle.net/11858/00-097C-0000-0015-8DAF-4) and it also takes advantage of two existing valency lexicons: PDT-Vallex for Czech and EngVallex for English, using the same view of valency (based on the Functional Generative Description theory). The CzEngVallex is available in an XML format in the LINDAT/CLARIN repository, and also in a searchable form (see the “More Apps” tab) interlinked with PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F),EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2) and with examples from the PCEDT.
GeCzLex 1.0 is an online electronic resource for translation equivalents of Czech and German discourse connectives. It contains anaphoric connectives for both languages and their possible translations documented in bilingual parallel corpora (not necessarily anaphoric). The entries have been interlinked via semantic annotation of the connectives (taken from monolingual lexicons of connectives CzeDLex and DiMLex) according to the PDTB 3 sense taxonomy and translation possibilities aquired from the Czech and German parallel data of the Intercorp project. The lexicon is the first bilingual inventory of connectives with linkage on the level of individual pairs (connective + discourse sense).
An LMF conformant XML-based file containing all Arabic characters (letters, vowels and punctuations). Each character described with a description, different displays (isolated, at the beginning, middle and the end of a word), a codification (Unicode, others could be added later), and two transliterations (Buckwalter and wiki).
An LMF conformant XML-based file containing the electronic version of al logha al arabia al moassira (Contemporary Arabic) dictionary. An Arabic monolingual dictionary accomplished by Ahmed Mukhtar Abdul Hamid Omar (deceased: 1424) with the help of a working group
A dictionary of morphologically segmented word forms in Czech. Rules of manual segmentation are described in Pelegrinová, K., Mačutek, J., Čech, R. (2021). The Menzerath-Altmann law as the relation between lengths of words and morphemes in Czech. Jazykovedný časopis, 72, 405-414. The dictionary is based on short stories, fairy tales, letters and studies written by Karel Čapek.
A dictionary of morphologically segmented word forms in Czech. Rules of manual segmentation are described in Pelegrinová, K., Mačutek, J., Čech, R. (2021). The Menzerath-Altmann law as the relation between lengths of words and morphemes in Czech. Jazykovedný časopis, 72, 405-414. The dictionary is based on short stories, fairy tales, letters and studies written by Karel Čapek.
The valency lexicon PDT-Vallex has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT). It contains over 11000 valency frames for more than 7000 verbs which occurred in the PDT or PCEDT. It is available in electronically processable format (XML) together with the aforementioned treebanks (to be viewed and edited by TrEd, the PDT/PCEDT main annotation tool), and also in more human readable form including corpus examples (see the WEBSITE link below). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives.
The data contains the morphemic dictionary scanned in the PDF format. It is divided into 3 parts:
introductions.pdf - pp. 11-102
main_dictionary.pdf - pp. 113-506
appendices.pdf - pp. 509-645
The file contains all Czech verbs included in the Retrograde Morphemic Dictionary of Czech Language (Slavíčková Eleonora, Academia 1975).
The data was obtained by scanning a portion of the dictionary that contains words ending in -ci and -ti. Among them, there were 18 non-verbs, which were removed. Using OCR, the data was converted into the plain text format and the result was checked by two independent readers. However, if a user encounters a forgotten error, please report.
The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language resources, i.e., relating verb meanings with respect to contextually-based verb synonymy. The lexicon entries are linked to PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F), EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2), CzEngVallex (http://hdl.handle.net/11234/1-1512), FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), VerbNet (http://verbs.colorado.edu/verbnet/index.html), PropBank (http://verbs.colorado.edu/%7Empalmer/projects/ace.html), Ontonotes (http://verbs.colorado.edu/html_groupings/), and English Wordnet (https://wordnet.princeton.edu/). Part of the dataset are files reflecting interannotator agreement.
The SynSemClass 3.5 synonym verb lexicon investigates semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English and German-English language resources, i.e., relates verb meanings with respect to contextually-based verb synonymy.
The Czech lexicon entries are linked to PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F), Vallex (http://hdl.handle.net/11234/1-3524), and CzEngVallex (http://hdl.handle.net/11234/1-1512).
The English lexicon entries are linked to EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2), CzEngVallex (http://hdl.handle.net/11234/1-1512), FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), VerbNet (https://uvi.colorado.edu/ and http://verbs.colorado.edu/verbnet/index.html), PropBank (http://propbank.github.io/), Ontonotes (http://clear.colorado.edu/compsem/index.php?page=lexicalresources&sub=ontonotes), and English Wordnet (https://wordnet.princeton.edu/).
The German lexicon entries are linked to Woxikon (https://synonyme.woxikon.de), E-VALBU (https://grammis.ids-mannheim.de/verbvalenz), and GUP (http://alanakbik.github.io/multilingual.html; https://github.com/UniversalDependencies/UD_German-GSD).
The SynSemClass synonym verb lexicon version 4.0 investigates, with respect to contextually-based verb synonymy, semantic ‘equivalence’ of Czech, English, and German verb senses and their valency behavior in parallel Czech-English and German-English language resources. SynSemClass 4.0 is a multilingual event-type ontology based on classes of synonymous verb senses, complemented with semantic roles and links to existing semantic lexicons. The version 4.0 is not only enriched by an additional number of classes but in the context of content hierarchy, some classes have been merged. Compared to the older versions of the lexicon, the novelty is the definitions of classes and the definitions of roles.
Czech lexicon entries are linked to PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F), Vallex (http://hdl.handle.net/11234/1-3524), and CzEngVallex (http://hdl.handle.net/11234/1-1512). The English lexicon entries are linked to EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2), CzEngVallex (http://hdl.handle.net/11234/1-1512), FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), VerbNet (https://uvi.colorado.edu/ and http://verbs.colorado.edu/verbnet/index.html), PropBank (http://propbank.github.io/), Ontonotes (http://clear.colorado.edu/compsem/index.php?page=lexicalresources&sub=ontonotes), and English Wordnet (https://wordnet.princeton.edu/). The German lexicon entries are linked to Woxikon (https://synonyme.woxikon.de), E-VALBU (https://grammis.ids-mannheim.de/verbvalenz), and GUP (http://alanakbik.github.io/multilingual.html; https://github.com/UniversalDependencies/UD_German-GSD).
The SynSemClass synonym verb lexicon version 5.0 is a multilingual resource that enriches previous editions of this event-type ontology with a new language, Spanish. The existing languages, English, Czech and German, are further substantially extended by a larger number of classes. SSC 5.0 data also contain lists (in a separate removed_cms.zip file) with originally (pre-)proposed but later rejected class members. All languages are organized into classes and have links to other lexical sources. In addition to the existing links, links to Spanish sources have been added.
The Spanish entries are linked to
ADESSE (http://adesse.uvigo.es/),
Spanish SenSem (http://grial.edu.es/sensem/lexico?idioma=en),
Spanish WordNet (https://adimen.si.ehu.es/cgi-bin/wei/public/wei.consult.perl),
AnCora (https://clic.ub.edu/corpus/en/ancoraverb_es), and
Spanish FrameNet (http://sfn.spanishfn.org/SFNreports.php).
The English entries are linked to
EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2),
CzEngVallex (http://hdl.handle.net/11234/1-1512),
FrameNet (https://framenet.icsi.berkeley.edu/)
VerbNet (https://uvi.colorado.edu/ and http://verbs.colorado.edu/verbnet/index.html),
PropBank (http://propbank.github.io/),
Ontonotes (http://clear.colorado.edu/compsem/index.php?page=lexicalresources&sub=ontonotes), and
English Wordnet (https://wordnet.princeton.edu/).
Czech entries are linked to
PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F),
Vallex (http://hdl.handle.net/11234/1-3524), and
CzEngVallex (http://hdl.handle.net/11234/1-1512).
The German entries are linked to
Woxikon (https://synonyme.woxikon.de),
E-VALBU (https://grammis.ids-mannheim.de/verbvalenz), and
GUP (http://alanakbik.github.io/multilingual.html and https://github.com/UniversalDependencies/UD_German-GSD).
The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language resources, i.e., relating verb meanings with respect to contextually-based verb synonymy. The lexicon entries are linked to PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F), EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2), CzEngVallex (http://hdl.handle.net/11234/1-1512), FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), VerbNet (https://uvi.colorado.edu/ and http://verbs.colorado.edu/verbnet/index.html), PropBank (http://propbank.github.io/), Ontonotes (http://clear.colorado.edu/compsem/index.php?page=lexicalresources&sub=ontonotes), and English Wordnet (https://wordnet.princeton.edu/).
The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language resources, i.e., relating verb meanings with respect to contextually-based verb synonymy. The lexicon entries are linked to PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F), EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2), CzEngVallex (http://hdl.handle.net/11234/1-1512), FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), VerbNet (https://uvi.colorado.edu/ and http://verbs.colorado.edu/verbnet/index.html), PropBank (http://propbank.github.io/), Ontonotes (http://clear.colorado.edu/compsem/index.php?page=lexicalresources&sub=ontonotes), and English Wordnet (https://wordnet.princeton.edu/).
Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding.
The current version of the UDer collection contains eleven harmonized resources covering eleven different languages.
Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains twenty-seven harmonized resources covering twenty different languages.
Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains thirty-one harmonized resources covering twenty-one different languages.
Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation scheme for many languages. The annotation scheme consists of simple tab-separated columns that stores a word and its morphological segmentations, including pieces of information about the word and the segmented units, e.g., part-of-speech categories, type of morphs/morphemes etc. The current public version of the collection contains 38 harmonised segmentation datasets covering 30 different languages.
The Valency Lexicon of Czech Verbs, Version 2.5 (VALLEX 2.5), is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description of valency frames of Czech verbs. VALLEX 2.5 has been developed at the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Prague.
VALLEX 2.5 provides information on the valency structure (combinatorial potential) of verbs in their particular senses - there are roughly 2,730 lexeme entries containing together around 6,460 lexical units ("senses"). and LC 536 - Center for Computational Linguistics, 1ET100300517 and 1ET101120503.
VALLEX 4.0 provides information on the valency structure (combinatorial potential) of verbs in their particular senses; each sense is by a gloss and examples. VALLEX 4.0 describes almost 4 700 Czech verbs in more than 11 000 lexical units, i.e., given verbs in the given senses. VALLEX 4.0 is a is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description of valency frames of Czech verbs. In order to satisfy different needs of different potential users, the lexicon is distributed (i) in a HTML version (the data allows for an easy and fast navigation through the lexicon) and (ii) in a machine-tractable form, so that the VALLEX data can be used in NLP applications. VALLEX 4.0 provides (in addition to information from previous versions) also characteristics of verbs expressing reciprocity and reflexivity.
The data is provided in two formats: XML and JSON.
VALLEX 4.5 provides information on the valency structure (combinatorial potential) of Czech verbs in their particular senses (almost 4 700 verbs in more than 11 080 lexical units, supplemented with more than 290 nouns in more than 350 lexical units forming complex predicates with light verbs). VALLEX 4.5 is an enhanced successor of VALLEX 3.0, 3.5, and 4.0. In addition to the information stored there, VALLEX 4.5 provides a detailed description of reflexive verbs, i.e., verbs with the reflexive "se" or "si" as an obligatory part of their verb lexemes. VALLEX 4.5 covers 1 525 reflexive verbs in 1 545 lexical units (2 501 when aspectual counterparts counted separately). In order to satisfy different needs of different potential users, the lexicon is distributed (i) online in a HTML version (the data allows for an easy and fast navigation through the lexicon) and (ii) in this distribution in a machine-tractable form, so that the VALLEX data can be used in NLP applications.
VPS-30-En is a small lexical resource that contains the following 30 English verbs: access, ally, arrive, breathe,
claim, cool, crush, cry, deny, enlarge, enlist, forge, furnish, hail, halt, part, plough, plug, pour, say, smash, smell, steer, submit, swell,
tell, throw, trouble, wake and yield. We have created and have been using VPS-30-En to explore the interannotator agreement potential
of the Corpus Pattern Analysis. VPS-30-En is a small snapshot of the Pattern Dictionary of English Verbs (Hanks and Pustejovsky,
2005), which we revised (both the entries and the annotated concordances) and enhanced with additional annotations. and This work has been partly supported by the Ministry of
Education of CR within the LINDAT-Clarin project
LM2010013, and by the Czech Science Foundation under
the projects P103/12/G084, P406/2010/0875 and
P401/10/0792.