Skip to search
Skip to main content
Skip to first result
Search
Search Results
Publisher:
University of Tampere
Format:
application/octet-stream
Type:
corpus
Language:
Finnish and Russian
Description:
Juridical texts in Russian and Finnish arranged as a comparable text corpus
Rights:
Not specified
Publisher:
Institute of Mathematics and Computer Science, University of Latvia
Format:
text/plain
Type:
corpus
Subject:
balanced corpus
Language:
Latvian
Description:
Balanced corpus of Modern Latvian (~ 1 million running words, currently in plain-text), publicly available via Bonito interface
Rights:
Not specified
Publisher:
Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:
corpus
Subject:
oral corpus
Language:
Catalan
Description:
Oral corpus containing 10 sociolinguistic interviews carried out in La Canonja (Tarragona).
Rights:
Not specified
Publisher:
Radboud University Nijmegen
Type:
corpus
Subject:
Linguistics and language technology
Description:
The Corpus NGT is a collection of data from deaf signers using Sign Language of the Netherlands (NGT). The data consist of recordings with multiple synchronised video cameras, accompanied by gloss and translation annotations.
Rights:
Creative Commons BY-NC-SA 3.0 NL license and http://creativecommons.org/licenses/by-nc-sa/3.0/nl/
Publisher:
Charles University
Type:
corpus
Language:
Czech
Description:
The Prague family of annotated corpora has a new member, the Czech Academic Corpus version 2.0 (CAC 2.0). CAC 2.0 consists of 650,000 words from various 1970s and 1980s newspapers, magazines and radio and television broadcast transcripts manually annotated for morphology and syntax.
Rights:
LDC Licence and LDC Catalog No.: LDC2008T22
Publisher:
NBG/DBNL/INL; Nicoline van der Sijs
Type:
corpus
Language:
Dutch
Description:
Digitised version of the Delftse Bijbel 1477
Rights:
Not specified
Publisher:
University of Southampton and Newcastle University
Type:
corpus
Language:
French
Description:
Seven French L2 corpora. Digital sound files and related transcripts formatted using CHILDES software. The database currently contains over 4000 files (sound files, transcripts and morphosyntactically tagged transcripts). .
Rights:
Not specified
Publisher:
IFA-groep, University of Amsterdam
Type:
corpus
Language:
Dutch
Description:
A video collection of spontaneous speech dialogues of 42 participants (14m, 28f)
Rights:
GNU GPL
Publisher:
Joint Research Centre of the EU
Type:
corpus
Language:
Bulgarian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Modern Greek (1453-) , Hungarian , Italian , Latvian , Maltese , Norwegian , Polish , Portuguese , Romanian , Slovak , Slovenian , Spanish , and Swedish
Description:
The largest parallel corpus, contains EU law, the Acquis Communautaire in 22 languages.
Rights:
Not specified
Publisher:
Center for Dutch Language and Speech, University of Antwerp
Type:
corpus
Description:
Document classification (based on web-mining)
Rights:
Not specified