1. LC-STAR Dialogues Publisher: Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP) Type: corpus Subject: oral corpus and bilingual Language: Catalan and Spanish Description: Bilingual oral corpus (55 hours recording). 77 Spanish speakers; 59 Catalan speakers; Environment: Local telephone. Annotation: orthographic. Rights: Not specified
2. Wikicorpus Publisher: Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP) Type: corpus Subject: trilingual corpus Language: Catalan, English, and Spanish Description: Trilingual corpus (Catalan, Spanish, English) that contains large portions of the Wikipedia (based on a 2006 dump) and has been automatically enriched with linguistic information. In its present version, it contains over 750 million words. Rights: Not specified