Language: French / Type: corpus - LINDAT/CLARIAH-CZ Catalog Search Results

31. Deltacorpus

Creator:: Mareček, David, Yu, Zhiwei, Zeman, Daniel, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: part of speech, tagging, semi-supervised, and cross-language
Language:: Belarusian, Bosnian, Bulgarian, Czech, Serbo-Croatian, Croatian, Upper Sorbian, Macedonian, Polish, Russian, Slovak, Slovenian, Serbian, Ukrainian, Latvian, Lithuanian, Afrikaans, Danish, German, English, Faroese, Western Frisian, Swiss German, Icelandic, Limburgan, Luxembourgish, Low German, Dutch, Norwegian Nynorsk, Norwegian, Scots, Swedish, Yiddish, Aragonese, Asturian, Catalan, French, Galician, Haitian, Italian, Latin, Lombard, Neapolitan, Piemontese, Portuguese, Romanian, Spanish, Venetian, Walloon, Breton, Welsh, Scottish Gaelic, Irish, Modern Greek (1453-), Armenian, Albanian, Dimli (individual language), Persian, Gilaki, Kurdish, Tajik, Bengali, Bishnupriya, Gujarati, Fiji Hindi, Hindi, Marathi, Nepali (macrolanguage), Urdu, Amharic, Arabic, Egyptian Arabic, Hebrew, Estonian, Finnish, Hungarian, Basque, Georgian, Chuvash, Azerbaijani, Turkish, Uzbek, Kazakh, Tatar, Yakut, Korean, Mongolian, Telugu, Kannada, Malayalam, Tamil, Newari, Vietnamese, Indonesian, Javanese, Malagasy, Maori, Malay (macrolanguage), Pampanga, Sundanese, Tagalog, Waray (Philippines), Swahili (macrolanguage), Esperanto, Ido, Interlingua (International Auxiliary Language Association), and Volapük
Description:: Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

32. Deltacorpus 1.1

Creator:: Mareček, David, Yu, Zhiwei, Zeman, Daniel, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: part of speech, tagging, semi-supervised, and cross-language
Language:: Belarusian, Bosnian, Bulgarian, Czech, Serbo-Croatian, Croatian, Upper Sorbian, Macedonian, Polish, Russian, Slovak, Slovenian, Serbian, Ukrainian, Latvian, Lithuanian, Afrikaans, Danish, German, English, Faroese, Western Frisian, Swiss German, Icelandic, Limburgan, Luxembourgish, Low German, Dutch, Norwegian Nynorsk, Norwegian, Scots, Swedish, Yiddish, Aragonese, Asturian, Catalan, French, Galician, Haitian, Italian, Latin, Lombard, Neapolitan, Piemontese, Portuguese, Romanian, Spanish, Venetian, Walloon, Breton, Welsh, Scottish Gaelic, Irish, Modern Greek (1453-), Armenian, Albanian, Dimli (individual language), Persian, Gilaki, Kurdish, Tajik, Bengali, Bishnupriya, Gujarati, Fiji Hindi, Hindi, Marathi, Nepali (macrolanguage), Urdu, Amharic, Arabic, Egyptian Arabic, Hebrew, Estonian, Finnish, Hungarian, Basque, Georgian, Chuvash, Azerbaijani, Turkish, Uzbek, Kazakh, Tatar, Yakut, Korean, Mongolian, Telugu, Kannada, Malayalam, Tamil, Newari, Vietnamese, Indonesian, Javanese, Malagasy, Maori, Malay (macrolanguage), Pampanga, Sundanese, Tagalog, Waray (Philippines), Swahili (macrolanguage), Esperanto, Ido, Interlingua (International Auxiliary Language Association), and Volapük
Description:: Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia). Changes in version 1.1: 1. Universal Dependencies tagset instead of the older and smaller Google Universal POS tagset. 2. SVM classifier trained on Universal Dependencies 1.2 instead of HamleDT 2.0. 3. Balto-Slavic languages, Germanic languages and Romance languages were tagged by classifier trained only on the respective group of languages. Other languages were tagged by a classifier trained on all available languages. The "c7" combination from version 1.0 is no longer used.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

33. Digitale Sammlungen der Universitäts- und Landesbibliothek Münster

Publisher:: Westfälische Wilhelms-Universität Münster
Type:: corpus
Subject:: Germanistik
Language:: French, German, and Latin
Description:: Digital copies of historical books and journals from the ULB Münster; collections from the region of Westphalia; Bilddigitalisate von Büchern und Zeitschriften aus dem historischen Bestand der ULB Münster sowie Sammlungen aus der Region Westfalen
Rights:: Not specified

34. DiscoMT 2015 Shared Task on Pronoun Translation

Creator:: Hardmeier, Christian, Tiedemann, Jörg, Nakov, Preslav, Stymne, Sara, and Versley, Yannick
Publisher:: Uppsala University
Type:: text and corpus
Subject:: machine translation, coreference resolution, anaphora resolution, and discourse
Language:: English and French
Description:: The data set includes training, development and test data from the shared tasks on pronoun-focused machine translation and cross-lingual pronoun prediction from the EMNLP 2015 workshop on Discourse in Machine Translation (DiscoMT2015). The release also contains the submissions to the pronoun-focused machine translation along with the manual annotations used for the official evaluation as well as gold-standard annotations of pronoun coreference for the shared task test set.
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

35. DiscoMT 2016 Shared Task on Cross-lingual Pronoun Prediction

Creator:: Guillou, Liane, Hardmeier, Christian, Nakov, Preslav, Stymne, Sara, Tiedemann, Jörg, Versley, Yannick, Cettolo, Mauro, Webber, Bonnie, and Popescu-Belis, Andrei
Publisher:: Uppsala University
Type:: text and corpus
Subject:: machine translation, coreference, discourse, and pronouns
Language:: English, French, and German
Description:: Files for the DiscoMT 2016 shared task on cross-lingual pronoun prediction
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

36. DiscoMT 2017 Shared Task on Cross-lingual Pronoun Prediction

Creator:: Loáiciga, Sharid, Stymne, Sara, Nakov, Preslav, Hardmeier, Christian, Tiedemann, Jörg, Cettolo, Mauro, and Versley, Yannick
Publisher:: Uppsala University
Type:: text and corpus
Subject:: machine translation, discourse, coreference, and pronouns
Language:: English, Spanish, German, and French
Description:: Data used in the 2017 shared task on cross-lingual pronoun prediction.
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

37. DPC (Dutch Parallel Corpus)

Publisher:: Katholieke Universiteit Leuven Campus Kortrijk, Hogeschool Gent
Type:: corpus
Language:: Dutch, English, and French
Description:: Parallel corpus, with Dutch as first language, 10 M words (under construction). DPC is a STEVIN-project.
Rights:: Not specified

38. Extended CLEF eHealth 2013-2015 IR Test Collection

Creator:: Pecina, Pavel and Saleh, Shadi
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: cross-lingual information retrieval and machine translation
Language:: English, Czech, French, German, Hungarian, Polish, Spanish, and Swedish
Description:: This package contains an extended version of the test collection used in the CLEF eHealth Information Retrieval tasks in 2013--2015. Compared to the original version, it provides complete query translations into Czech, French, German, Hungarian, Polish, Spanish and Swedish and additional relevance assessment.
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

39. Frantext

Publisher:: ATILF
Type:: corpus
Language:: French
Description:: mainly literature (17th to 20th century)
Rights:: Not specified

40. French emblems at Glasgow

Publisher:: University of Glasgow
Type:: corpus
Language:: French
Description:: French emblem books (27 in total) of the 16th century, together with Latin versions where appropriate. Transcribed and facsimile versions, and extensive search functionality.
Rights:: Not specified

31. Deltacorpus

32. Deltacorpus 1.1

33. Digitale Sammlungen der Universitäts- und Landesbibliothek Münster

34. DiscoMT 2015 Shared Task on Pronoun Translation

35. DiscoMT 2016 Shared Task on Cross-lingual Pronoun Prediction

36. DiscoMT 2017 Shared Task on Cross-lingual Pronoun Prediction

37. DPC (Dutch Parallel Corpus)

38. Extended CLEF eHealth 2013-2015 IR Test Collection

39. Frantext

40. French emblems at Glasgow

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from