This resource is an Italian morphological dictionary for content words, encoded in a JSON Lines format text file. It contains correspondences between surface form and lexical forms of words followed by grammatical features. The surface word forms have been generated algorithmically by using stable phonological and morphological rules of the Italian language. Particular attention has been given to the generation of verbs for which rules have been extracted from the famous A.L e G. Lepschy, La lingua italiana. The dictionary with its remarkable coverage is particularly useful used together with the Italian Function Words (http://hdl.handle.net/11372/LRT-2288) for tasks such as POS-Tagging or Syntactic Parsing.
This resource is the second version of an Italian morphological dictionary for content words, encoded in a JSON Lines format text file. It contains correspondences between surface form and lexical forms of words followed by standard grammatical properties. Compared to the first release, this version has a better JSON structure. The surface word forms have been generated algorithmically by using stable phonological and morphological rules of the Italian language. Particular attention has been given to the generation of verbs for which rules have been extracted from A.L e G. Lepschy, La Lingua Italiana. The dictionary with its remarkable coverage is particularly useful used together with the Italian Function Words v2 (http://hdl.handle.net/11372/LRT-2629) for tasks such as pos-tagging or syntactic parsing.
This resource is the third version of the Italian morphological dictionary for content words (http://hdl.handle.net/11372/LRT-2630), encoded in a JSON Lines format. Compared to the previous version, it contains some minor improvements.
This dictionary is a curated list of Italian function words in a JSON Lines format text file, particularly useful for tasks such as POS-Tagging or Syntactic Parsing. It contains 999 single-word forms and 2501 multi-words forms. Each entry may have the following grammatical features: lemma, pos, mood, tense, person, number, gender, case, degree.
This dictionary is the second version of 11372/LRT-2288, a curated list of Italian function words in a JSON Lines format text file, particularly useful for tasks such as POS-Tagging or Syntactic Parsing. It contains 999 single-word forms and 2501 multi-words forms. Each entry may have the following grammatical features: lemma, pos, mood, tense, person, number, gender, case, degree. Compared to the first release, this version has a more clear JSON structure.
This dictionary is the third version of 11372/LRT-2288, a curated list of Italian function words in a JSON Lines format text file, particularly useful for tasks such as part of speech tagging or syntactic parsing. Compared to the previous release, this version includes some minor improvements.
Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains twenty-seven harmonized resources covering twenty different languages.
Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains thirty-one harmonized resources covering twenty-one different languages.
Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation scheme for many languages. The annotation scheme consists of simple tab-separated columns that stores a word and its morphological segmentations, including pieces of information about the word and the segmented units, e.g., part-of-speech categories, type of morphs/morphemes etc. The current public version of the collection contains 38 harmonised segmentation datasets covering 30 different languages.