Written German from 1920-39. 500,000 tokens, 392 texts. POS and lemma, TEI XML. Part of Das digitale Wörterbuch der deutschen Sprache der 20. Jahrhunderts
Articles from the 'Berliner Zeitung' online edition from 3.1.1994 to 31.12.2005. About 252 million tokens in 869,000 articles. Part of the DWDS project.
A corpus of dialect speech from Tyneside in North-East England. digitized audio, standard orthographic transcription, phonetic transcription, and part-of-speech tagged
The C4 corpus is a joined effort of the project Digitales Wörterbuch der deutschen Sprache (DWDS), the Austrian Academy Corpus (AAC), the Korpus Südtirol and the Schweizer Textkorpus (CHTK). The Corpus is composed of corpora of all four partner institutions.
Corpus of the weekly Die Zeit from 1946 - present day (complete runs from 1996). Over 100 million words in 200,000 articles. Updated daily. Part of DWDS project.