Skip to search
Skip to main content
Skip to first result
Search
Search Results
Type:
corpus
Language:
Arabic , Danish , Dutch , English , German , Modern Greek (1453-) , Italian , Japanese , Korean , Portuguese , Russian , Spanish , and Turkish
Description:
Large set of subtitles available for download in multiple languages. Can be used as parallel corpus.
Rights:
Not specified
Publisher:
Radboud University Nijmegen , Max Planck Institute for Psycholinguistics , Meertens Institute KNAW The Netherlands , and Babylon Centre for Studies of Multilingualism in the Multicultural Society
Type:
corpus
Language:
Arabic , Dutch , and Turkish
Description:
Audio recordings, transcripts,
Rights:
Not specified
Publisher:
Max Planck Institute for Psycholinguistics
Type:
corpus
Language:
Croatian , German , Russian , and Turkish
Description:
Language Acquisition corpus
Rights:
Not specified
Type:
corpus
Subject:
Multilingual access to interactive communication services for the Mediterranean and the Middle East
Language:
Modern Greek (1453-) , Turkish , Arabic , and Hebrew
Description:
Collection of telephone databases from mediterranean region, incl. (variants of) Arabic. 500-1000 speakers per database, all orthographically transcribed. Speaker information regarding gender, age and accent. Phonetic lexicons included.
Rights:
Not specified
Type:
corpus
Language:
Czech , Danish , Dutch , English , Finnish , French , German , Hungarian , Italian , Polish , Portuguese , Russian , Spanish , Swedish , Turkish , Chinese , Hebrew , Japanese , Korean , and Thai
Description:
28 speech databases containing broadband recordings from 550 adults and 50 children per language. Contains interesting phonetically rich material. All orthographically transcribed. Speaker information included for gender, age, accent. Including pronunciation lexicon.
Rights:
Not specified
Publisher:
Natural Language Processing Group, Computer Science Department, Istanbul Technical University
Type:
toolService
Language:
Turkish
Description:
the state of the art Turkish NLP tools: preprocessing/normalization for social media, morphology, syntax and entity recognition.
Rights:
Not specified
Publisher:
University of Leipzig
Type:
corpus
Language:
Afrikaans , Albanian , Bulgarian , Catalan , Chinese , Croatian , Czech , Danish , Dutch , English , Esperanto , Estonian , Finnish , French , German , Hungarian , Icelandic , Indonesian , Italian , Japanese , Korean , Latin , Latvian , Lithuanian , Malay (macrolanguage) , Norwegian , Occitan (post 1500) , Romanian , Russian , Slovak , Slovenian , Spanish , Sundanese , Swedish , Tagalog , Turkish , Vietnamese , and Welsh
Description:
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences
Rights:
Not specified