The Thesaurus linguae Latinae is the first comprehensive dictionary of ancient Latin;
• it is compiled on the basis of all Latin texts surviving from antiquity (until AD 600), both literary and non-literary
• for less common words it cites every attestation, for the rest (those marked with an asterisk) an instructive and representative sample
• it records all meanings (including technical usages) and all constructions
• it documents peculiarities of inflection, spelling, and prosody
• it supplies information about the etymology of the Latin words and their survival in the Romance languages, contributed by recognised authorities in the fields of Indo-European and Romance studies
• it collects the comments of ancient sources on the word in question
The Thesaurus therefore offers for every Latin word a comprehensive, richly documented picture of its possibilities and history – not only for Latin scholars, but also for scholars of the various branches of ancient studies and for related disciplines.
An elegantly simple and robust machine-learning method, based on the combination of ideas from a number of MBL implementations, resulting in a useful tool for NLP research.
A package of tools for Catalan and Spanish corpus processing. It includes a text handling module and a probabilistic POS tagger. It also allows consulting POS tagger dictionary data.
TREQ exploits the knowledge embedded in the parallel corpora and produces a set of
translation equivalents (a translation lexicon), based on a 1:1 mapping
hypothesis. The program uses almost no linguistic knowledge, relying on statistical evidence and some simplifying assumptions.
The extraction process is based on a testing approach. It generates first a list of translation equivalent candidates and then successively extracts the most likely translation equivalence pairs. It does not require a pre-existing bilingual lexicon for the considered languages. Yet, if such a lexicon exists, it can be used to eliminate spurious candidate translation equivalence pairs and thus to speed up the process and increase its accuracy. The algorithm relies on some pre-processing of the bitext: sentence aligner, tokeniser (using [[(http://www.lpl.univaix.fr/projects/multext/MtSeg|MtSeg]]), a collocation extractor (unaware of translation equivalence), POS-tagger, lemmatiser.
More detailed descriptions are available in the following paper (http://www.racai.ro/~tufis/papers/):
-- Dan Tufiş and Ana-Maria Barbu (2002). Revealing translators knowledge: statistical methods in constructing practical translation lexicons for language and speech processing. In International Journal of Speech Technology, volume 5, pp. 199-209. Kluwer Academic Publishers, November 2002. ISSN 1381-2416.
-- Dan Tufiş (2002). A cheap and fast way to build useful translation lexicons. In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 1030-1036, Taipei, Taiwan, August 2002. ISBN 1-55860-894.
-- Dan Tufiş and Ana Maria Barbu (2001). Automatic Construction of Translation Lexicons. In V.V.Kluew, C.E. D'Attellis, and N.E. Mastorakis (eds.), Advances in Automation, Multimedia and Video Systems, and Modern Computer Science, pp. 156-161. WSES Press, December 2001. ISSN 1790-5117.
-- Dan Tufiş and Ana Maria Barbu (2001). Extracting Multilingual Lexicons from Parallel Corpora. In Proceedings of the ACH-ALLC conference (ACH-ALLC 2001), New York, USA, June 2001.
-- Dan Tufiş and Ana Maria Barbu (2001). Accurate Automatic Extraction of Translation Equivalents from Parallel Corpora. In Paul Rayson, Andrew Wilson, Tony McEnery, Andrew Hardie, and Shereen Khoja., editors, Proceedings of the Corpus Linguistics 2001 Conference (CL 2001), pp. 581-586, Lancaster, UK, March 2001. Lancaster University, Computing Department. ISBN 1-86220-107-2.
Translog 2006 is the leading tool for analysing human text production processes. It was originally designed for translation process research, but can be used for a variety of personal learning, teaching, and research purposes.