Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Gurevych, Iryna , Habernal, Ivan , and Zayed, Omnia
Publisher:
Technische Universität Darmstadt
Type:
text and corpus
Subject:
CommonCrawl , Creative Commons , Web corpus , and Amazon Web Services
Language:
Afrikaans , Arabic , Bengali , Bulgarian , Czech , Danish , German , Modern Greek (1453-) , English , Estonian , Persian , Finnish , French , Gujarati , Hebrew , Hindi , Croatian , Hungarian , Indonesian , Italian , Japanese , Kannada , Korean , Latvian , Lithuanian , Malayalam , Marathi , Macedonian , Nepali (macrolanguage) , Dutch , Norwegian , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Somali , Spanish , Albanian , Swahili (macrolanguage) , Swedish , Tamil , Telugu , Tagalog , Thai , Turkish , Ukrainian , Undetermined , Urdu , Vietnamese , and Chinese
Description:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) , http://creativecommons.org/licenses/by-nc-nd/4.0/ , and PUB
Creator:
Galuščáková, Petra , Pecina, Pavel , Hoffmannová, Petra , Hajič, Jan , Ircing, Pavel , and Švec, Jan
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
audio and corpus
Subject:
annotated corpus , corpus , speech corpus , annotation , audio , and multilingual
Language:
Czech , English , French , German , and Spanish
Description:
The package contains Czech recordings of the Visual History Archive which consists of the interviews with the Holocaust survivors. The archive consists of audio recordings, four types of automatic transcripts, manual annotations of selected topics and interviews' metadata. The archive totally contains 353 recordings and 592 hours of interviews.
Rights:
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) , http://creativecommons.org/licenses/by-nc-nd/4.0/ , and PUB
Creator:
Guillou, Liane , Hardmeier, Christian , Nakov, Preslav , Stymne, Sara , Tiedemann, Jörg , Versley, Yannick , Cettolo, Mauro , Webber, Bonnie , and Popescu-Belis, Andrei
Publisher:
Uppsala University
Type:
text and corpus
Subject:
machine translation , coreference , discourse , and pronouns
Language:
English , French , and German
Description:
Files for the DiscoMT 2016 shared task on cross-lingual pronoun prediction
Rights:
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) , http://creativecommons.org/licenses/by-nc-nd/4.0/ , and PUB
Creator:
Loáiciga, Sharid , Stymne, Sara , Nakov, Preslav , Hardmeier, Christian , Tiedemann, Jörg , Cettolo, Mauro , and Versley, Yannick
Publisher:
Uppsala University
Type:
text and corpus
Subject:
machine translation , discourse , coreference , and pronouns
Language:
English , Spanish , German , and French
Description:
Data used in the 2017 shared task on cross-lingual pronoun prediction.
Rights:
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) , http://creativecommons.org/licenses/by-nc-nd/4.0/ , and PUB