« Previous |
1 - 10 of 26
|
Next »
Number of results to display per page
Search Results
2. DiscoMT 2017 Shared Task on Cross-lingual Pronoun Prediction
- Creator:
- Loáiciga, Sharid, Stymne, Sara, Nakov, Preslav, Hardmeier, Christian, Tiedemann, Jörg, Cettolo, Mauro, and Versley, Yannick
- Publisher:
- Uppsala University
- Type:
- text and corpus
- Subject:
- machine translation, discourse, coreference, and pronouns
- Language:
- English, Spanish, German, and French
- Description:
- Data used in the 2017 shared task on cross-lingual pronoun prediction.
- Rights:
- Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB
3. Extended CLEF eHealth 2013-2015 IR Test Collection
- Creator:
- Pecina, Pavel and Saleh, Shadi
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- cross-lingual information retrieval and machine translation
- Language:
- English, Czech, French, German, Hungarian, Polish, Spanish, and Swedish
- Description:
- This package contains an extended version of the test collection used in the CLEF eHealth Information Retrieval tasks in 2013--2015. Compared to the original version, it provides complete query translations into Czech, French, German, Hungarian, Polish, Spanish and Swedish and additional relevance assessment.
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
4. Khresmoi Query Translation Test Data 1.0
- Creator:
- Pecina, Pavel, Dušek, Ondřej, Hajič, Jan, and Urešová, Zdeňka
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- corpus, test data, medical, health, machine translation, Czech, French, German, and English
- Language:
- English, French, German, and Czech
- Description:
- This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from general public and medical experts. and This work was supported by the EU FP7 project Khresmoi (European Comission contract No. 257528). The language resources are distributed by the LINDAT/Clarin project of the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2010013). We thank Health on the Net Foundation for granting the license for the English general public queries, TRIP database for granting the license for the English medical expert queries, and three anonymous translators and three medical experts for translating amd revising the data.
- Rights:
- Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0), http://creativecommons.org/licenses/by-nc/3.0/, and PUB
5. Khresmoi Query Translation Test Data 2.0
- Creator:
- Pecina, Pavel, Dušek, Ondřej, Hajič, Jan, Libovický, Jindřich, and Urešová, Zdeňka
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- corpus, test data, medical, health, machine translation, Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
- Language:
- Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
- Description:
- This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans Swedish. The queries come from general public and medical experts. This is version 2.0 extending the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
6. Khresmoi Summary Translation Test Data 1.1
- Creator:
- Dušek, Ondřej, Hajič, Jan, Hlaváčová, Jaroslava, Pecina, Pavel, Tamchyna, Aleš, and Urešová, Zdeňka
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- corpus, test data, medical, health, machine translation, Czech, French, German, and English
- Language:
- English, Czech, French, and German
- Description:
- This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German. and This work was supported by the EU FP7 project Khresmoi (European Comission contract No. 257528). The language resources are distributed by the LINDAT/Clarin project of the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2010013). We thank all the data providers and copyright holders for providing the source data and anonymous experts for translating the sentences.
- Rights:
- Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0), http://creativecommons.org/licenses/by-nc/3.0/, and PUB
7. Khresmoi Summary Translation Test Data 2.0
- Creator:
- Dušek, Ondřej, Hajič, Jan, Hlaváčová, Jaroslava, Libovický, Jindřich, Pecina, Pavel, Tamchyna, Aleš, and Urešová, Zdeňka
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- corpus, test data, medical, health, machine translation, Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
- Language:
- Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
- Description:
- This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech, English, French, German, Hungarian, Polish, Spanish and Swedish. Version 2.0 extends the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
8. Large-Scale Colloquial Persian 0.5
- Creator:
- Abdi Khojasteh, Hadi, Ansari, Ebrahim, and Bohlouli, Mahdi
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) and Institute for Advanced Studies in Basic Sciences (IASBS)
- Type:
- text and corpus
- Subject:
- PoS tagging, corpus, annotated corpus, multilingual, derivation, dependency parser, machine translation, informal language, spoken language, monolingual corpus, and bilingual corpus annotation
- Language:
- Persian, English, German, Czech, Italian, and Hindi
- Description:
- "Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in five different languages (EN, CS, DE, IT, HI).
- Rights:
- Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB
9. MCSQ Translation Models (en-de) (v1.0)
- Creator:
- Variš, Dušan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- machine translation, transformer, and neural machine translation
- Language:
- English and German
- Description:
- En-De translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). The models were trained using the MCSQ social surveys dataset (available at https://repo.clarino.uib.no/xmlui/bitstream/handle/11509/142/mcsq_v3.zip). Their main use should be in-domain translation of social surveys. Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on MCSQ test set (BLEU): en->de: 67.5 (train: genuine in-domain MCSQ data only) de->en: 75.0 (train: additional in-domain backtranslated MCSQ data) (Evaluated using multeval: https://github.com/jhclark/multeval)
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
10. ParaCrawl Corpus version 1.0
- Creator:
- Koehn, Philipp, Heafield, Kenneth, Forcada, Mikel L., Esplà-Gomis, Miquel, Ortiz-Rojas, Sergio, Sánchez, Gema Ramírez, Cartagena, Víctor M. Sánchez, Haddow, Barry, Bañón, Marta, Střelec, Marek, Samiotou, Anna, and Kamran, Amir
- Publisher:
- ParaCrawl
- Type:
- text and corpus
- Subject:
- ParaCrawl, parallel corpus, CommonCrawl, machine translation, and text corpora
- Language:
- English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Czech, Romanian, Finnish, Latvian, Russian, and Estonian
- Description:
- The January 2018 release of the ParaCrawl is the first version of the corpus. It contains parallel corpora for 11 languages paired with English, crawled from a large number of web sites. The selection of websites is based on CommonCrawl, but ParaCrawl is extracted from a brand new crawl which has much higher coverage of these selected websites than CommonCrawl. Since the data is fairly raw, it is released with two quality metrics that can be used for corpus filtering. An official "clean" version of each corpus uses one of the metrics. For more details and raw data download please visit: http://paracrawl.eu/releases.html
- Rights:
- Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB
- « Previous
- Next »
- 1
- 2
- 3