Skip to search
Skip to main content
Skip to first result
Search
Search Results
Type:
corpus
Language:
Arabic , Danish , Dutch , English , German , Modern Greek (1453-) , Italian , Japanese , Korean , Portuguese , Russian , Spanish , and Turkish
Description:
Large set of subtitles available for download in multiple languages. Can be used as parallel corpus.
Rights:
Not specified
Publisher:
King's College London
Format:
application/tei+xml
Type:
corpus
Language:
English
Description:
Charters written in Anglo-Saxon England before A.D. 900, marked-up in TEI XML. Browsable online.
Rights:
Not specified
Type:
corpus
Language:
English
Description:
Electronic texts, corpora, lexicons. other
Rights:
Not specified
Publisher:
Center of Computational Linguistics, Vytautas Magnus University
Format:
application/xml
Type:
corpus
Language:
Czech , English , and Lithuanian
Description:
A collection of parallel corpora: English-Lithuanian (2m words), Lithuanian-English (0,06m words), Czech-Lithuanian (0,8m words), Lithuanian-Czech (0,02m words). All the corpora are online-searcheable via one interface at http://donelaitis.vdu.lt/main_en.php?id=4&nr=1_2. The corpus is still being updated with new texts.
Rights:
Not specified
Type:
corpus
Subject:
Germanistik
Language:
Chinese , Czech , English , French , German , Latin , and Spanish
Description:
Digital copies of historical botanic papers from the Missouri Botanical Garden Library; Bilddigitalisate von historischen botanischen Schriften; deutschsprachige Texte stellen nur einen Teilbereich dar
Rights:
Not specified
Publisher:
Coventry University, University of Reading, University of Warwick
Format:
application/tei+xml
Type:
corpus
Language:
English
Description:
Transcribed recordings of 160 lectures and 39 seminars held in university departments. Four broad disciplinary groups, 1,644,942 tokens in total.
Rights:
Not specified
Type:
corpus
Language:
English
Description:
General reference corpus; 100 million words; POS, lemma, descriptive metadata
Rights:
Not specified
Publisher:
Research Group in Computational Linguistics, University of Wolverhampton
Type:
corpus
Language:
English
Description:
Sentences annotated for important units of text for summarisation. 145,473 words / 6584 sentences
Rights:
Not specified
Publisher:
University College, Cork
Format:
application/tei+xml
Type:
corpus
Language:
English , Irish , and Latin
Description:
searchable online corpus of multilingual texts of Irish literature and history
Rights:
Not specified
Type:
corpus
Language:
English
Description:
British English (London); Spoken, general, age-specific dialect corpus; 500 000 words, 55 hrs of recording; POS, speaker/conversation metainfo
Rights:
Not specified