Skip to search
Skip to main content
Skip to first result
Search
Search Results
Type:
corpus
Language:
Arabic , Danish , Dutch , English , German , Modern Greek (1453-) , Italian , Japanese , Korean , Portuguese , Russian , Spanish , and Turkish
Description:
Large set of subtitles available for download in multiple languages. Can be used as parallel corpus.
Rights:
Not specified
Publisher:
Center for Sprogteknologi, University of Copenhagen
Type:
toolService
Language:
Danish , Dutch , English , German , Modern Greek (1453-) , Icelandic , Norwegian , Russian , Slovenian , and Swedish
Description:
1) Fully automatic rule based lemmatization of inflected languages 2) Fully automatic training of lemmatization rules based on full form-lemma list
Rights:
Not specified
Publisher:
Department of Languages, University of Jyväskylä
Type:
corpus
Language:
Dutch , Finnish , and Russian
Description:
A corpus of spontaneous discussions and read-aloud performances from native speakers of different ages. Parallel corpus in Russian, Finnish, and Dutch.
Rights:
Not specified
Type:
corpus
Language:
Danish , Dutch , English , Finnish , French , German , Italian , Latin , Portuguese , Russian , Spanish , Swedish , and Telugu
Description:
Possibility to download or to browse free electronic books; Angebot: Download von und Online-Zugang zu frei verfügbaren E-Books; deutschsprachige Literatur stellt nur einen Teilbereich der verfügbaren E-Books dar
Rights:
Not specified
Publisher:
University of Stuttgart
Type:
toolService
Subject:
POS tagger
Language:
Bulgarian , Dutch , English , French , German , Modern Greek (1453-) , Italian , Portuguese , Russian , Spanish , and Swahili (macrolanguage)
Description:
A part-of-speech tagger and lemmatizer for several languages.
Rights:
Not specified
Publisher:
University of Leipzig
Type:
corpus
Language:
Afrikaans , Albanian , Bulgarian , Catalan , Chinese , Croatian , Czech , Danish , Dutch , English , Esperanto , Estonian , Finnish , French , German , Hungarian , Icelandic , Indonesian , Italian , Japanese , Korean , Latin , Latvian , Lithuanian , Malay (macrolanguage) , Norwegian , Occitan (post 1500) , Romanian , Russian , Slovak , Slovenian , Spanish , Sundanese , Swedish , Tagalog , Turkish , Vietnamese , and Welsh
Description:
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences
Rights:
Not specified