POS tagger. (The input file must be in plain text format (file.txt) and UTF-8 encoded. The disambiguation process is done by a TreeTagger instance trained by the IULA.)
A tool for statistical corpus exploitation. It offers concordances, counts ngrams, extracts collocations and gives association, distribution and similarity measures.
A tool for contrasting terminological vocabularies and textual corpora. It allows controlling the presence and location of reference vocabularies in textual corpora.
Ted Pedersen's Ngram Statistics Package (used to identify word Ngrams that appear in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared test, the Dice Coefficient, etc.).
A package of tools for the processing of the Corpus Tècnic in Catalan and Spanish. It includes a preprocessor, a PoSTagger and a linguistic disambiguator.