Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Gurevych, Iryna , Habernal, Ivan , and Zayed, Omnia
Publisher:
Technische Universität Darmstadt
Type:
text and corpus
Subject:
CommonCrawl , Creative Commons , Web corpus , and Amazon Web Services
Language:
Afrikaans , Arabic , Bengali , Bulgarian , Czech , Danish , German , Modern Greek (1453-) , English , Estonian , Persian , Finnish , French , Gujarati , Hebrew , Hindi , Croatian , Hungarian , Indonesian , Italian , Japanese , Kannada , Korean , Latvian , Lithuanian , Malayalam , Marathi , Macedonian , Nepali (macrolanguage) , Dutch , Norwegian , Panjabi , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Somali , Spanish , Albanian , Swahili (macrolanguage) , Swedish , Tamil , Telugu , Tagalog , Thai , Turkish , Ukrainian , Undetermined , Urdu , Vietnamese , and Chinese
Description:
A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:
Creative Commons - Attribution 4.0 International (CC BY 4.0) , http://creativecommons.org/licenses/by/4.0/ , and PUB
Creator:
Javorský, Dávid , Macháček, Dominik , and Bojar, Ondřej
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
manual evaluation , simultaneous speech subtitling , Continuous Rating , and questionnaire evaluation
Language:
German and Czech
Description:
Collected data from Continuous Rating evaluation study; collected Continuous Rating scores and Questionnaires.
Rights:
Creative Commons - Attribution 4.0 International (CC BY 4.0) , http://creativecommons.org/licenses/by/4.0/ , and PUB
Creator:
Müller-Spitzer, Carolin and Ochs, Samira
Publisher:
IDS Mannheim
Type:
text and corpus
Subject:
gender-fair language , websites , personal designations , gender-inclusive language , and gender linguistics
Language:
German
Description:
Annotated dataset consisting of personal designations found on websites of 42 German, Austrian, Swiss and South Tyrolean cities. Our goal is to re-evaluate the websites every year in order to see how the use of gender-fair language develops over time. The dataset contains coordinates for the creation of map material.
Rights:
Creative Commons - Attribution 4.0 International (CC BY 4.0) , http://creativecommons.org/licenses/by/4.0/ , and PUB