A vocabulary resulting from the cooperation of the groups of REALITER network that collects the basic terminology mostly used in texts about Genomics. It contains equivalents in English, Peninsular and Latinamerican Spanish, French, Italian, Galician, Portuguese and Catalan.
Statistical analysis service: It calculates P(cue|class): probability of seeing a linguistic cue given a lexical class. This probability is computed given the occurrences of cues in a corpus (codified in the signatures file) and the information of belonging or not belonging of these words to different classes (codified in indicators file).
The probability is computed for each studied cue in the signatures file and for each class in the indicators file.
This RESTful service allows to define a sub-corpus from different annotated corpora. The service includes a POS tag harmonisation process where original tags are converted to EAGLES/Parole format. The eventual sub-corpus is indexed using the IMS CWB tool. The user receives an ID which can be used by the CQP service to exploit the sub-corpus.
This RESTful service accesses part of the Hemeroteca Digital de l’Arxiu Municipal de Girona (digital press archive from the Girona city council), specifically Catalan press from 2003. The service uses the SRU protocol.
Search engine for the neologisms database of the NEOROM network. The network collects neologisms used in the press written in Romance languages from 2005 onwards.
Oral corpus containing 166 narratives in English elicited by means of Labovian techniques. Participants from the UK (England, Wales, Scotland), Ireland, USA, Australia and South Africa.
The electronic version of the book “Corpus PAAU 1992: Descriptive Studies, Texts and Vocabulary” includes the texts that have been object of analysis in this project as well as the vocabulary lists that make up the Corpus 92.
domain specific corpus (Law, Economy, Computing, Medicine and Environment as well as a contrastive corpus from the press); EN 3.3 M tokens, SP 33 M tokens, CAT 19 M tokens; EAGLEs pos tagset