Skip to search
Skip to main content
Skip to first result
Search
Search Results
Type:
corpus
Language:
Arabic , Danish , Dutch , English , German , Modern Greek (1453-) , Italian , Japanese , Korean , Portuguese , Russian , Spanish , and Turkish
Description:
Large set of subtitles available for download in multiple languages. Can be used as parallel corpus.
Rights:
Not specified
Publisher:
Center for Sprogteknologi, University of Copenhagen
Type:
toolService
Language:
Danish , Dutch , English , German , Modern Greek (1453-) , Icelandic , Norwegian , Russian , Slovenian , and Swedish
Description:
1) Fully automatic rule based lemmatization of inflected languages 2) Fully automatic training of lemmatization rules based on full form-lemma list
Rights:
Not specified
Publisher:
Joint Research Centre of the EU
Type:
corpus
Language:
Bulgarian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Modern Greek (1453-) , Hungarian , Italian , Latvian , Maltese , Norwegian , Polish , Portuguese , Romanian , Slovak , Slovenian , Spanish , and Swedish
Description:
The largest parallel corpus, contains EU law, the Acquis Communautaire in 22 languages.
Rights:
Not specified
Publisher:
Universität Bamberg, World Language Documentation Centre
Format:
application/octet-stream
Type:
lexicalConceptualResource
Language:
Afrikaans , Arabic , Basque , Bulgarian , Catalan , Chinese , Czech , Danish , Dutch , English , Esperanto , Estonian , Finnish , French , Galician , Georgian , Modern Greek (1453-) , Hebrew , Hungarian , Icelandic , Indonesian , Interlingua (International Auxiliary Language Association) , Irish , Italian , Japanese , Khmer , Norwegian , Polish , Portuguese , Romanian , Russian , Serbian , Slovak , Spanish , Swedish , Turkish , Ukrainian , and Welsh
Rights:
GFDL or CC and http://www.omegawiki.org/Licensing
Type:
corpus
Language:
Danish , Dutch , English , Finnish , French , German , Modern Greek (1453-) , Italian , and Spanish
Description:
9 speech databases for training and testing multilingual speech recognition applications in the car environment. Contains parallel 4 channel in-car recordings and a GSM channel. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
Rights:
Not specified
Publisher:
Center for Reading Research, Ghent University
Type:
lexicalConceptualResource
Language:
Chinese , Dutch , English , German , Modern Greek (1453-) , and Spanish
Rights:
Not specified
Publisher:
University of Stuttgart
Type:
toolService
Subject:
POS tagger
Language:
Bulgarian , Dutch , English , French , German , Modern Greek (1453-) , Italian , Portuguese , Russian , Spanish , and Swahili (macrolanguage)
Description:
A part-of-speech tagger and lemmatizer for several languages.
Rights:
Not specified