Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Gurevych, Iryna , Habernal, Ivan , and Zayed, Omnia
Publisher:
Technische Universität Darmstadt
Type:
text and corpus
Subject:
CommonCrawl , Creative Commons , Web corpus , and Amazon Web Services
Language:
Afrikaans , Arabic , Bengali , Bulgarian , Czech , Danish , German , Modern Greek (1453-) , English , Estonian , Persian , Finnish , French , Gujarati , Hebrew , Hindi , Croatian , Hungarian , Indonesian , Italian , Japanese , Kannada , Korean , Latvian , Lithuanian , Malayalam , Marathi , Macedonian , Nepali (macrolanguage) , Dutch , Norwegian , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Somali , Spanish , Albanian , Swahili (macrolanguage) , Swedish , Tamil , Telugu , Tagalog , Thai , Turkish , Ukrainian , Undetermined , Urdu , Vietnamese , and Chinese
Description:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) , http://creativecommons.org/licenses/by-nc-nd/4.0/ , and PUB
Creator:
Hardmeier, Christian , Tiedemann, Jörg , Nakov, Preslav , Stymne, Sara , and Versley, Yannick
Publisher:
Uppsala University
Type:
text and corpus
Subject:
machine translation , coreference resolution , anaphora resolution , and discourse
Language:
English and French
Description:
The data set includes training, development and test data from the shared tasks on pronoun-focused machine translation and cross-lingual pronoun prediction from the EMNLP 2015 workshop on Discourse in Machine Translation (DiscoMT2015). The release also contains the submissions to the pronoun-focused machine translation along with the manual annotations used for the official evaluation as well as gold-standard annotations of pronoun coreference for the shared task test set.
Rights:
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) , http://creativecommons.org/licenses/by-nc-nd/4.0/ , and PUB
Creator:
Guillou, Liane , Hardmeier, Christian , Nakov, Preslav , Stymne, Sara , Tiedemann, Jörg , Versley, Yannick , Cettolo, Mauro , Webber, Bonnie , and Popescu-Belis, Andrei
Publisher:
Uppsala University
Type:
text and corpus
Subject:
machine translation , coreference , discourse , and pronouns
Language:
English , French , and German
Description:
Files for the DiscoMT 2016 shared task on cross-lingual pronoun prediction
Rights:
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) , http://creativecommons.org/licenses/by-nc-nd/4.0/ , and PUB
Creator:
Loáiciga, Sharid , Stymne, Sara , Nakov, Preslav , Hardmeier, Christian , Tiedemann, Jörg , Cettolo, Mauro , and Versley, Yannick
Publisher:
Uppsala University
Type:
text and corpus
Subject:
machine translation , discourse , coreference , and pronouns
Language:
English , Spanish , German , and French
Description:
Data used in the 2017 shared task on cross-lingual pronoun prediction.
Rights:
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) , http://creativecommons.org/licenses/by-nc-nd/4.0/ , and PUB