Ted Pedersen's Ngram Statistics Package (used to identify word Ngrams that appear in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared test, the Dice Coefficient, etc.).
A package of tools for the processing of the Corpus Tècnic in Catalan and Spanish. It includes a preprocessor, a PoSTagger and a linguistic disambiguator.
It calculates the Term Frequency and the Inverse Document Frequency of a word in a given corpus (a statistical measure used to evaluate how important a word is to a document in a collection or corpus).
An electronic version of a vocabulary that resulted from the collaboration with the Labour Department. Its nomenclature includes more than 1,000 terms; besides, it contains six thematic annexes and a Catalan-Spanish index.