Harvested from: LINDAT/CLARIAH-CZ repository / Language: Czech - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Language Czech Harvested from LINDAT/CLARIAH-CZ repository

581. WMT 2011 Testing Set

Creator:: Galuščáková, Petra and Bojar, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: WMT, test data, and Slovak
Language:: Slovak, Czech, and English
Description:: Testing set from WMT 2011 [1] competition, manually translated from Czech and English into Slovak. Test set contains 3003 sentences in Czech, Slovak and English. Test set is described in [2]. References: [1] http://www.statmt.org/wmt11/evaluation-task.html [2] Petra Galuščáková and Ondřej Bojar. Improving SMT by Using Parallel Data of a Closely Related Language. In Human Language Technologies - The Baltic Perspective - Proceedings of the Fifth International Conference Baltic HLT 2012, volume 247 of Frontiers in AI and Applications, pages 58-65, Amsterdam, Netherlands, October 2012. IOS Press. and The work on this project was supported by the grant EuroMatrixPlus (FP7-ICT- 2007-3-231720 of the EU and 7E09003 of the Czech Republic)
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

582. WMT16 Tuning Shared Task Models (Czech-to-English)

Creator:: Kamran, Amir, Jawaid, Bushra, Bojar, Ondřej, and Stanojevic, Milos
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) and University of Amsterdam, ILLC
Type:: text and corpus
Subject:: WMT16, machine translation, tuning, baseline models, and shared task
Language:: Czech and English
Description:: The item contains models to tune for the WMT16 Tuning shared task for Czech-to-English. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align (https://github.com/clab/fast_align) and the standard Moses pipeline is used for training. Two 5-gram language models are trained using KenLM: one only using the CzEng English data and the other is trained using all available English mono data for WMT except Common Crawl. Also included are two lexicalized bidirectional reordering models, word based and hierarchical, with msd conditioned on both source and target of processed CzEng.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

583. WMT16 Tuning Shared Task Models (English-to-Czech)

Creator:: Kamran, Amir, Jawaid, Bushra, Bojar, Ondřej, and Stanojevic, Milos
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) and University of Amsterdam, ILLC
Type:: text and corpus
Subject:: WMT16, machine translation, tuning, baseline models, and shared task
Language:: English and Czech
Description:: This item contains models to tune for the WMT16 Tuning shared task for English-to-Czech. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align (https://github.com/clab/fast_align) and the standard Moses pipeline is used for training. Two 5-gram language models are trained using KenLM: one only using the CzEng Czech data and the other is trained using all available Czech mono data for WMT except Common Crawl. Also included are two lexicalized bidirectional reordering models, word based and hierarchical, with msd conditioned on both source and target of processed CzEng.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

584. WMT18 Quality Estimation Shared Task Test Data

Creator:: Specia, Lucia, Logacheva, Varvara, Blain, Frederic, Fernandez, Ramon, and Martins, André
Publisher:: University of Sheffield
Type:: text and corpus
Subject:: machine translation, quality estimation, and machine learning
Language:: English, German, Czech, and Latvian
Description:: Test data for the WMT18 QE task. Train data can be downloaded from http://hdl.handle.net/11372/LRT-2619. This shared task will build on its previous six editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks make use of datasets produced from post-editions by professional translators. The datasets are domain-specific (IT and life sciences/pharma domains) and extend from those used previous years with more instances and more languages. One important addition is that this year we also include datasets with neural MT outputs. In addition to advancing the state of the art at all prediction levels, our specific goals are: To study the performance of quality estimation approaches on the output of neural MT systems. We will do so by providing datasets for two language language pairs where the same source segments are translated by both a statistical phrase-based and a neural MT system. To study the predictability of deleted words, i.e. words that are missing in the MT output. TO do so, for the first time we provide data annotated for such errors at training time. To study the effectiveness of explicitly assigned labels for phrases. We will do so by providing a dataset where each phrase in the output of a phrase-based statistical MT system was annotated by human translators. To study the effect of different language pairs. We will do so by providing datasets created in similar ways for four language language pairs. To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits. Measure progress over years at all prediction levels. We will do so by using last year's test set for comparative experiments. In-house statistical and neural MT systems were built to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes. Participants are allowed to explore any additional data and resources deemed relevant.
Rights:: AGREEMENT ON THE USE OF DATA IN QT21, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

585. WMT18 Quality Estimation Shared Task Training and Development Data

Creator:: Specia, Lucia, Logacheva, Varvara, Blain, Frederic, Fernandez, Ramon, and Martins, André
Publisher:: University of Sheffield
Type:: text and corpus
Subject:: machine translation, quality estimation, and machine learning
Language:: English, German, Czech, and Latvian
Description:: Training and development data for the WMT18 QE task. Test data will be published as a separate item. This shared task will build on its previous six editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks make use of datasets produced from post-editions by professional translators. The datasets are domain-specific (IT and life sciences/pharma domains) and extend from those used previous years with more instances and more languages. One important addition is that this year we also include datasets with neural MT outputs. In addition to advancing the state of the art at all prediction levels, our specific goals are: To study the performance of quality estimation approaches on the output of neural MT systems. We will do so by providing datasets for two language language pairs where the same source segments are translated by both a statistical phrase-based and a neural MT system. To study the predictability of deleted words, i.e. words that are missing in the MT output. TO do so, for the first time we provide data annotated for such errors at training time. To study the effectiveness of explicitly assigned labels for phrases. We will do so by providing a dataset where each phrase in the output of a phrase-based statistical MT system was annotated by human translators. To study the effect of different language pairs. We will do so by providing datasets created in similar ways for four language language pairs. To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits. Measure progress over years at all prediction levels. We will do so by using last year's test set for comparative experiments. In-house statistical and neural MT systems were built to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes. Participants are allowed to explore any additional data and resources deemed relevant.
Rights:: AGREEMENT ON THE USE OF DATA IN QT21, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

586. Word representations for multiple languages

Creator:: Müller, Thomas and Schütze, Hinrich
Publisher:: Center for Information and Language Processing, University of Munich
Type:: text and corpus
Subject:: morphological dictionary, morphological analysis, and PoS tagging
Language:: English, German, Latin, Hungarian, Spanish, and Czech
Description:: Dictionaries with different representations for various languages. Representations include brown clusters of different sizes and morphological dictionaries extracted using different morphological analyzers. All representations cover the most frequent 250,000 word types on the Wikipedia version of the respective language. Analzers used: MAGYARLANC (Hungarian, Zsibrita et al. (2013)), FREELING (English and Spanish, Padro and Stanilovsky (2012)), SMOR (German, Schmid et al. (2004)), an MA from Charles University (Czech, Hajic (2001)) and LATMOR (Latin, Springmann et al. (2014)).
Rights:: Creative Commons - Attribution 3.0 Unported (CC BY 3.0), http://creativecommons.org/licenses/by/3.0/, and PUB

587. WordSim353-cs: Evaluation Dataset for Lexical Similarity and Relatedness, based on WordSim353

Creator:: Cinková, Silvie, Straková, Jana, Hajič, Jakub, Hajič, Jan, Hajič, Jan, jr., Janoušková, Jolana, Straka, Milan, and Urešová, Miroslava
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, wordList, and lexicalConceptualResource
Subject:: lexical semantics, similarity, relatedness, evaluation, and distributional semantics
Language:: Czech and English
Description:: Czech translation of WordSim353. The Czech translation of English WordSim353 word pairs were obtained from four translators. All translation variants were scored according to the lexical similarity/relatedness annotation instructions for WordSim353 annotators, by 25 Czech annotators. The resulting data set consists of two annotation files: "WordSim353-cs.csv" and "WordSim-cs-Multi.csv". Both files are encoded in UTF-8, have a header, text is enclosed in double quotes, and columns are separated by commas. The rows are numbered. The WordSim-cs-Multi data set has rows numbered from 1 to 634, whereas the row indices in the WordSim353-cs data set reflect the corresponding row numbers in the WordSim-cs-Multi data set. The WordSim353-cs file contains a one-to-one mapping selection of 353 Czech equivalent pairs whose judgments have proven to be most similar to the judgments of their corresponding English originals (compared by the absolute value of the difference between the means over all annotators in each language counterpart). In one case ("psychology-cognition"), two Czech equivalent pairs had identical means as well as confidence intervals, so we randomly selected one. The "WordSim-cs-Multi.csv" file contains human judgments for all translation variants. In both data sets, we preserved all 25 individual scores. In the WordSim353-cs data set, we added a column with their Czech means as well as a column containing the original English means and 95% confidence intervals in separate columns for each mean (computed by the CI function in the Rmisc R package). The WordSim-cs-Multi data set contains only the Czech means and confidence intervals. For the most convenient lexical search, we provided separate columns with the respective Czech and English single words, entire word pairs, and eventually an English-Czech quadruple in both data sets. The data set also contains an xls table with the four translations and a preliminary selection of the best variants performed by an adjudicator.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

588. Workers' Holiday in Český Šternberk

Creator:: Aktualita
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: akce zotavovací Heydrich Reinhard, rekreace Protektorát, lehátka opalovací, lidé opalující se, zotavovna dělnická exteriér, jídelna, číšník, lidé stravující se, muž pijící pivo, volejbal rekreační, rybář, lidé koupající se, řeka, jez, Protektorát rekreace, hrad exteriér, Heydrichiáda, Places::Český Šternberk::hrad::celkově, Places::Český Šternberk::zotavovna, Places::řeka Sázava::Český Šternberk, and Český zvukový týdeník Aktualita::1942/32A
Language:: Czech
Description:: Segment from Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel) 1942, issue no. 32B, reports on a workers´ holiday organized by the Reinhard Heydrich Foundation for Workers´ Recuperation in Český Šternberk. A view of the exterior of the health resort. Holidaymakers are sunbathing on the terrace. A waiter is carrying plates full of food in the dining room. People are eating. A close-up of a man drinking beer from a beer mug. Holidaymakers playing volleyball. A fisherman is sitting on the bank of the Sázava River. People are bathing in the river and in the weir. Český Šternberk Castle can be seen in the background.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

589. Workers' Holiday in Luhačovice

Creator:: Aktualita
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: akce zotavovací Heydrich Reinhard, rekreace Protektorát, nádraží, vlak přijíždějící, Národní odborová ústředna zaměstnanecká akce, akce Národní odborová ústředna zaměstnanecká, hotel exteriér, projev veřejný, restaurace lázeňská, reportér rozhlasový, mikrofon rozhlasový, oběd pro rekreanty, volejbal rekreační, dívky vijící věnce, prameny léčivé, rekreanti u minerálních pramenů, prameny minerální, Protektorát rekreace, Protektorát stravování, kuželky, vití věnců, terasa hotelová, věnce vití, Heydrichiáda, Places::Luhačovice::nádraží, Places::Luhačovice::kolonáda, Places::Luhačovice::hotel Lázeňský dům /ext./, Places::Luhačovice::dělnická zotavovna /ext./, Places::Luhačovice::hotel Bezděz /ext./, Places::Luhačovice::Luhačovická přehrada::Pozlovice, and Český zvukový týdeník Aktualita::1942/24A
Language:: Czech
Description:: Segment from Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel) 1942, issue no. 24A, reports on a workers´ holiday organized by the Reinhard Heydrich Foundation for Workers´ Recuperation in Luhačovice. Footage of a train arriving at the railway station and the welcoming of the holidaymakers. Lunch is ready for visitors at a local restaurant. Holidaymakers rest on the hotel terrace, some play volleyball or skittles. Others explore the surrounding countryside. Footage of a walk to the Luhačovice Dam. Girls sit on the grass, weaving flower wreaths. Holidaymakers taste the local mineral water.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

590. Workers' Holiday in Věšín

Creator:: Aktualita
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: akce zotavovací Heydrich Reinhard, rekreace Protektorát, zotavovna dělnická, rozcvička, lidé stravující se, číšník, jídelna pod širým nebem, boxování, bazén plavecký, házení do vody, lidé plavající, nápis Vítáme vás v dělnické zotavovně, cedule obce Věšín okres Blatná, Protektorát rekreace, Protektorát stravování, Heydrichiáda, Places::Věšín u Blatné::zotavovna, and Český zvukový týdeník Aktualita::1942/32A
Language:: Czech
Description:: Segment from Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel) 1942, issue no. 32A, reports on a workers´ holiday organized by the Reinhard Heydrich Foundation for Workers´ Recuperation in the village of Věšín u Blatné. Holidaymakers walk through the health resort´s gate. Morning exercise in the courtyard. A waiter carries plates full of food across the outdoor dining room, people are eating. Footage of holidaymakers enjoying leisure activities, an improvised boxing match, swimming in the pool, playing water sports. A view of an entrance arch with a sign saying "Welcome to the Workers´ Health Resort".
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

« Previous
Next »
1
2
…
55
56
57
58
59
60
61

581. WMT 2011 Testing Set

582. WMT16 Tuning Shared Task Models (Czech-to-English)

583. WMT16 Tuning Shared Task Models (English-to-Czech)

584. WMT18 Quality Estimation Shared Task Test Data

585. WMT18 Quality Estimation Shared Task Training and Development Data

586. Word representations for multiple languages

587. WordSim353-cs: Evaluation Dataset for Lexical Similarity and Relatedness, based on WordSim353

588. Workers' Holiday in Český Šternberk

589. Workers' Holiday in Luhačovice

590. Workers' Holiday in Věšín

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from