Harvested from: LINDAT/CLARIAH-CZ repository - LINDAT/CLARIAH-CZ Catalog Search Results

1911. Test Data DE-EN APE Shared Task WMT17

Creator:: Turchi, Marco, Chatterjee, Rajen, and Negri, Marco
Publisher:: Fondazione Bruno Kessler, Trento, Italy
Type:: text and corpus
Subject:: machine translation, shared task, automatic post-editing, and post-editing
Language:: English and German
Description:: Test data for the WMT 2017 Automatic post-editing task (the same used for the Sentence-level Quality Estimation task). They consist in German-English triplets (source and target) belonging to the pharmacological domain and already tokenized. Test set contains 2,000 pairs. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Rights:: AGREEMENT ON THE USE OF DATA IN QT21 APE Task, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

1912. Test Data EN-DE APE Shared Task WMT17

Creator:: Turchi, Marco, Chatterjee, Rajen, and Negri, Matteo
Publisher:: Fondazione Bruno Kessler, Trento, Italy
Type:: text and corpus
Subject:: machine translation, shared task, automatic post-editing, and post-editing
Language:: English and German
Description:: Test data for the WMT 2017 Automatic post-editing task (the same used for the Sentence-level Quality Estimation task). They consist in 2,000 English-German pairs (source and target) belonging to the IT domain and already tokenized. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Rights:: AGREEMENT ON THE USE OF DATA IN QT21 APE Task, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

1913. Test Data EN-DE MT_NMT APE Shared Task WMT18

Creator:: Chatterjee, Rajen, Negri, Matteo, and Turchi, Marco
Publisher:: Fondazione Bruno Kessler, Trento, Italy
Type:: text and corpus
Subject:: machine translation, shared task, automatic post-editing, post-editing, and neural machine translation
Language:: English and German
Description:: Test data for the WMT 2018 Automatic post-editing task. They consist in English-German pairs (source and target) belonging to the information technology domain and already tokenized. Test set contains 1,023 pairs. A neural machine translation system has been used to generate the target segments. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Rights:: AGREEMENT ON THE USE OF DATA IN QT21 APE Task, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

1914. Test Data EN-DE MT_PBSMT APE Shared Task WMT18

Creator:: Turchi, Marco, Negri, Matteo, and Chatterjee, Rajen
Publisher:: Fondazione Bruno Kessler, Trento, Italy
Type:: text and corpus
Subject:: machine translation, shared task, automatic post-editing, post-editing, and phrase-based MT
Language:: English and German
Description:: Test data for the WMT 2018 Automatic post-editing task. They consist in English-German pairs (source and target) belonging to the information technology domain and already tokenized. Test set contains 2,000 pairs. A phrase-based machine translation system has been used to generate the target segments. This test set is sampled from the same dataset used for the 2016 and 2017 APE shared task editions. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Rights:: AGREEMENT ON THE USE OF DATA IN QT21 APE Task, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

1915. Text Categorization Toolkit

Creator:: Montejo-Ráez, Arturo
Publisher:: European Organization for Nuclear Research (CERN) and University of Jaén (Spain)
Type:: toolService
Subject:: text categorization
Description:: TECAT is a command-line tool for multi-label text categorization and evaluation. It is capable of combining multiple bases binary classifiers (built-in and external ones).
Rights:: Not specified

1916. TextGrid Repository (TextGridRep)

Type:: corpus
Subject:: Germanistik
Language:: German
Description:: TextGrid has purchased the Zeno.org online library (literary, historical, scientific, ... texts) and successively converts it to TEI. TextGrid hat die Online-Bibliothek von Zeno.org (literarische, naturwissenschaftliche, historische, ... Texte) erworben und konvertiert diese sukzessive in ein gültiges TEI-Format.
Rights:: Not specified

1917. Textsammlung von Thomas Gloning

Type:: corpus
Subject:: Germanistik
Language:: German and Latin
Description:: i.a. collection of old herbal books, old cookery books and texts on the history of German language in print media; u.a. eine Sammlung von alten Kräuterbüchern, alten Kochbüchern und Texten zur Geschichte der deutschen Pressesprache
Rights:: Not specified

1918. tfidf

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: It calculates the Term Frequency and the Inverse Document Frequency of a word in a given corpus (a statistical measure used to evaluate how important a word is to a document in a collection or corpus).
Rights:: Not specified

1919. The ACL RD-TEC 2.0

Creator:: QasemiZadeh, Behrang and Schumann, Anne-Kathrin
Publisher:: DFG Collaborative Research Centre 991, University of Duesseldorf and Department of Applied Linguistics, Translation and Interpreting, Saarland University
Type:: text and corpus
Subject:: Terminology, Term Extraction, Term Classification, Entity Recognition, Evaluation Corpus, Language Resource, Gold Dataset, and Evaluation of Automatic Terminology Construction Methods
Language:: English
Description:: The ACL RD-TEC 2.0 has been developed with the aim of providing a benchmark for the evaluation of methods for terminology extraction and classification as well as entity recognition tasks based on specialised text from the computational linguistics domain. This release of the corpus consists of 300 abstracts from articles in the ACL Anthology Reference Corpus, published between 1978--2006. In these abstracts, terms (i.e., single or multi-word lexical units with a specialised meaning) are manually annotated. In addition to their boundaries in running text, annotated terms are classified into one of the seven categories method, tool, language resource (LR), LR product, model, measures and measurements, and other. To assess the quality of the annotations and to determine the difficulty of this task, more than 171 of the abstracts are annotated twice, independently, by each of the two annotators. In total, 6,818 terms are identified and annotated, resulting in a specialised vocabulary made of 3,318 lexical forms, mapped to 3,471 concepts.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

1920. The Atlas of Place Names (Paikannimikartasto)

Type:: lexicalConceptualResource
Language:: Finnish
Description:: Finnish place names
Rights:: Not specified

1911. Test Data DE-EN APE Shared Task WMT17

1912. Test Data EN-DE APE Shared Task WMT17

1913. Test Data EN-DE MT_NMT APE Shared Task WMT18

1914. Test Data EN-DE MT_PBSMT APE Shared Task WMT18

1915. Text Categorization Toolkit

1916. TextGrid Repository (TextGridRep)

1917. Textsammlung von Thomas Gloning

1918. tfidf

1919. The ACL RD-TEC 2.0

1920. The Atlas of Place Names (Paikannimikartasto)

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from