Language: Russian / Rights: http://creativecommons.org/licenses/by-nc-sa/4.0/

Start Over Language Russian Rights http://creativecommons.org/licenses/by-nc-sa/4.0/

Creator:: Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
Publisher:: Technische Universität Darmstadt
Type:: text and corpus
Subject:: CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
Language:: Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
Description:: A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

Creator:: Stanislav Brouček
Publisher:: Ústav pro etnografii a folkloristiku ČSAV
Format:: print and 274 s. : foto. příl., mp.
Type:: model:monograph and TEXT
Subject:: Etnologie. Etnografie. Folklor, 19. století, dějiny, etnografie, folkloristika, češství, Česko, 39, 94(437.3), 398, 316.344.8(=162.3), (048.8), and 1
Language:: Czech, German, and Russian
Description:: Stanislav Brouček., Ruské a německé resumé, and Společný český a německý název
Rights:: http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: coreference resolution, CorPipe, and CorefUD
Language:: Catalan, Czech, German, English, Spanish, French, Hungarian, Lithuanian, Norwegian Bokmål, Norwegian Nynorsk, Polish, Russian, and Turkish
Description:: The `corpipe23-corefud1.1-231206` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 23 (https://github.com/ufal/crac2023-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no _corpus id_ on input), so it can be used to predict coreference in any `mT5` language (for zero-shot evaluation, see the paper). However, note that the empty nodes must be present already on input, they are not predicted (the same settings as in the CRAC23 shared task).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

Format:: print
Type:: model:internalpart and TEXT
Language:: Czech, Russian, and English
Rights:: http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public

Format:: print
Type:: model:internalpart and TEXT
Language:: Czech, Russian, and English
Rights:: http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public

Format:: print
Type:: model:internalpart and TEXT
Language:: Czech, Russian, and English
Rights:: http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public

Type:: model:monograph and TEXT
Language:: Czech, English, Russian, and German
Rights:: http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public

Creator:: Variš, Dušan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: machine translation, neural machine translation, and transformer
Language:: English and Russian
Description:: En-Ru translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). The models were trained using the MCSQ social surveys dataset (available at https://repo.clarino.uib.no/xmlui/bitstream/handle/11509/142/mcsq_v3.zip). Their main use should be in-domain translation of social surveys. Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on MCSQ test set (BLEU): en->ru: 64.3 (train: genuine in-domain MCSQ data) ru->en: 74.7 (train: additional backtranslated in-domain MCSQ data) (Evaluated using multeval: https://github.com/jhclark/multeval)
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

Creator:: Fedrová, Stanislava, Hejk, Jan, Jedličková, Alice, Ústav pro českou literaturu (Akademie věd ČR), and Mezi deklamovánkou a románem. Proměny žánrů v české a slovenské literatuře (2005 : Praha, Česko)
Publisher:: Ústav pro českou literaturu AV ČR
Format:: electronic and 187 l.
Type:: model:monograph and TEXT
Subject:: Literatura různých forem a žánrů (o ní), 1620-1989, 19.-20. století, česká literatura, slovenská literatura, literární žánry, literárněvědné rozbory, Czech literature, Slovak literature, literary criticism and history, literary forms, 821.162.3, 821.162.4, 82-1/-9, 82.09, (062.534), 11, and 82-1/-8
Language:: Czech, Slovak, English, German, and Russian
Description:: Stanislava Fedrová, Jan Hejk, Alice Jedličková (edd.)., Obsahuje bibliografie a bibliografické odkazy, and Část. slovenský text, anglická, německé a ruské resumé
Rights:: http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public

Creator:: Cotgrove, Louis Alexander
Publisher:: University of Nottingham
Type:: text and corpus
Subject:: youth language, Computer-Mediated Communication, Digitally-Mediated Communication, CMC, DMC, online, YouTube, digital, emoji, translanguaging, multilingualism, and social media
Language:: German, English, Russian, Turkish, and Serbo-Croatian
Description:: The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represents an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB