An XML-based file containing the electronic version of al wassit dictionary. An Arabic monolingual dictionary accomplished by the Academy of the Arabic Language in Cairo
An LMF conformant XML-based file containing the electronic version of al wassit dictionary. An Arabic monolingual dictionary accomplished by the Academy of the Arabic Language in Cairo
Phonological networks are representations of word forms and their phonological relationships with other words in a given language lexicon. A principle underlying the growth (or evolution) of those networks is preferential attachment, or the ‘rich-gets-richer’ mechanisms, according to which words with many phonological neighbors (or links) are the main beneficiaries of future growth opportunities. Due to their limited number of words, language lexica constitute node-constrained networks where growth cannot keep increasing in a linear way; hence, preferential attachment is likely mitigated by certain factors. The present study investigated aging effects (i.e., a word’s finite time span of being active in terms of growth) in an evolving phonological network of English as a second language. It was found that phonological neighborhoods are constructed by one large initial lexical spurt, followed by sublinear growth spurts that eventually lead to very limited growth in later lexical spurts during network evolution, all the while obeying the law of preferential attachment. An analysis of the strength of phonological relationships between phonological word forms revealed a tendency to attach more distant phonological neighbors in the lower proficiency levels, while phonologically more similar neighbors enter phonological neighborhoods at more advanced levels of English as a second language. Overall, the findings suggest an aging effect in growth that favors younger words. In addition, beginning learners seem to prefer the acquisition of phonological neighbors that are easier to discriminate. Implications for the second language lexicon include leveraged learning mechanisms, learning bouts focussed on a smaller range of phonological segments, and involve questions concerning lexical processing in aging networks.
Annotated corpus of 350 decision of Czech top-tier courts (Supreme Court, Supreme Administrative Court, Constitutional Court).
Every decision is annotated by two trained annotators and then manually adjudicated by one trained curator to solve possible disagreements between annotators. Adjudication was conducted non-destructively, therefore dataset contains all original annotations.
Corpus was developed as training and testing material for reference recognition tasks. Dataset contains references to other court decisions and literature. All references consist of basic units (identifier of court decision, identification of court issuing referred decision, author of book or article, title of book or article, point of interest in referred document etc.), values (polarity, depth of discussion etc.).
Annotated corpus of 350 decision of Czech top-tier courts (Supreme Court, Supreme Administrative Court, Constitutional Court).
Every decision is annotated by two trained annotators and then manually adjudicated by one trained curator to solve possible disagreements between annotators. Adjudication was conducted non-destructively, therefore corpus (raw) contains all original annotations.
Corpus was developed as training and testing material for reference recognition tasks. Dataset contains references to other court decisions and literature. All references consist of basic units (identifier of court decision, identification of court issuing referred decision, author of book or article, title of book or article, point of interest in referred document etc.), values (polarity, depth of discussion etc.).
We defined 58 dramatic situations and annotated them in 19 play scripts. Then we selected only 5 well-recognized dramatic situations and annotated further 33 play scripts. In this version of the data, we release only play scripts that can be freely distributed, which is 9 play scripts. One play is annotated independently by three annotators.
A XML-based file containing all Arabic characters (letters, vowels and punctuations). Each character described with a description, different displays (isolated, at the beginning, middle and the end of a word), a codification (Unicode, others could be added later), and two transliterations (Buckwalter and wiki)
An annotated corpus dedicated to the benchmark and evaluation of Arabic morphological analyzers. It consists of 100 words with all their possible analysis. The corpus contains several morphological information such as stem, pattern, root, lemma, etc.