Human post-edited test sentences for the WMT 2017 Automatic post-editing task. This consists in 2,000 German sentences belonging to the IT domain and already tokenized. Source and target segments can be downloaded from: https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2133. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Human post-edited and reference test sentences for the En-De PBSMT WMT 2018 Automatic post-editing task. This consists of 2,000 German sentences for each file belonging to the IT domain and already tokenized. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Chronology of German literature (Old High German literature, Middle High German literature, Early New High German literature, New High German literature); Chronologie der deutschen Literatur (alt-, mittel-, frühneu-, neuhochdeutsche Literatur)
Digital, morphologically annotated (N, V, A) part of the Bonn Corpus of Early New High German; used to create the Grammatik des Frühneuhochdeutschen (III. Nouns; IV. Verbs; VI. Adjectives); morphologisch annotiert; Materialgrundlage für die Erarbeitung der Bände 3, 4 und 6 der "Grammatik des Frühneuhochdeutschen"
Digital copies of historical botanic papers from the Missouri Botanical Garden Library; Bilddigitalisate von historischen botanischen Schriften; deutschsprachige Texte stellen nur einen Teilbereich dar
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.