Language: English / Rights: PUB - LINDAT/CLARIAH-CZ Catalog Search Results

111. Khresmoi Query Translation Test Data 1.0

Creator:: Pecina, Pavel, Dušek, Ondřej, Hajič, Jan, and Urešová, Zdeňka
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: corpus, test data, medical, health, machine translation, Czech, French, German, and English
Language:: English, French, German, and Czech
Description:: This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from general public and medical experts. and This work was supported by the EU FP7 project Khresmoi (European Comission contract No. 257528). The language resources are distributed by the LINDAT/Clarin project of the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2010013). We thank Health on the Net Foundation for granting the license for the English general public queries, TRIP database for granting the license for the English medical expert queries, and three anonymous translators and three medical experts for translating amd revising the data.
Rights:: Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0), http://creativecommons.org/licenses/by-nc/3.0/, and PUB

112. Khresmoi Query Translation Test Data 2.0

Creator:: Pecina, Pavel, Dušek, Ondřej, Hajič, Jan, Libovický, Jindřich, and Urešová, Zdeňka
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: corpus, test data, medical, health, machine translation, Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
Language:: Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
Description:: This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans Swedish. The queries come from general public and medical experts. This is version 2.0 extending the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

113. Khresmoi Summary Translation Test Data 1.1

Creator:: Dušek, Ondřej, Hajič, Jan, Hlaváčová, Jaroslava, Pecina, Pavel, Tamchyna, Aleš, and Urešová, Zdeňka
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: corpus, test data, medical, health, machine translation, Czech, French, German, and English
Language:: English, Czech, French, and German
Description:: This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German. and This work was supported by the EU FP7 project Khresmoi (European Comission contract No. 257528). The language resources are distributed by the LINDAT/Clarin project of the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2010013). We thank all the data providers and copyright holders for providing the source data and anonymous experts for translating the sentences.
Rights:: Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0), http://creativecommons.org/licenses/by-nc/3.0/, and PUB

114. Khresmoi Summary Translation Test Data 2.0

Creator:: Dušek, Ondřej, Hajič, Jan, Hlaváčová, Jaroslava, Libovický, Jindřich, Pecina, Pavel, Tamchyna, Aleš, and Urešová, Zdeňka
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: corpus, test data, medical, health, machine translation, Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
Language:: Czech, English, French, German, Hungarian, Polish, Spanish, and Swedish
Description:: This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech, English, French, German, Hungarian, Polish, Spanish and Swedish. Version 2.0 extends the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

115. KonText Web Demo

Creator:: Josífko, Michal
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and tool
Subject:: web service, corpus, parallel corpus, and demo
Language:: Czech and English
Description:: An interactive web demo for querying selected ÚFAL and LINDAT corpora. LINDAT/CLARIN KonText is a fork of ÚČNK KonText (https://github.com/czcorpus/kontext, maintained by Tomáš Machálek) that contains some modifications and additional features. Kontext, in turn, is a fork of the Bonito 2.68 python web interface to the corpus management tool Manatee (http://nlp.fi.muni.cz/trac/noske, created by Pavel Rychlý).
Rights:: GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB

116. Korektor 2

Creator:: Straka, Milan and Richter, Michal
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: Korektor, spellchecker, spellchecking, grammar checker, and diacritical marks generation
Language:: English
Description:: Korektor is a statistical spell-checker and (occasionally) grammar-checker. It is released under 2-Clause BSD license http://opensource.org/licenses/BSD-2-Clause. Korektor started with Michal Richter's diploma thesis Advanced Czech Spellchecker https://redmine.ms.mff.cuni.cz/documents/1, but it is being developed further. There are two versions: a command line utility (tested on Linux, Windows and OS X) and a REST service with publicly available API http://lindat.mff.cuni.cz/services/korektor/api-reference.php and HTML front end https://lindat.mff.cuni.cz/services/korektor/.
Rights:: BSD 2-Clause "Simplified" or "FreeBSD" license, http://opensource.org/licenses/BSD-2-Clause, and PUB

117. Large-Scale Colloquial Persian 0.5

Creator:: Abdi Khojasteh, Hadi, Ansari, Ebrahim, and Bohlouli, Mahdi
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) and Institute for Advanced Studies in Basic Sciences (IASBS)
Type:: text and corpus
Subject:: PoS tagging, corpus, annotated corpus, multilingual, derivation, dependency parser, machine translation, informal language, spoken language, monolingual corpus, and bilingual corpus annotation
Language:: Persian, English, German, Czech, Italian, and Hindi
Description:: "Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in five different languages (EN, CS, DE, IT, HI).
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

118. Lingua::Interset 2.026

Creator:: Zeman, Daniel
Publisher:: Charles University, Faculty of Mathematics and Physics
Type:: tool and toolService
Subject:: morphology, part of speech, conversion, and tagset
Language:: Arabic, Bulgarian, Bengali, Catalan, Czech, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Japanese, Multiple languages, and Portuguese
Description:: Lingua::Interset is a universal morphosyntactic feature set to which all tagsets of all corpora/languages can be mapped. Version 2.026 covers 37 different tagsets of 21 languages. Limited support of the older drivers for other languages (which are not included in this package but are available for download elsewhere) is also available; these will be fully ported to Interset 2 in future. Interset is implemented as Perl libraries. It is also available via CPAN.
Rights:: Artistic License (Perl) 1.0, http://opensource.org/licenses/Artistic-Perl-1.0, and PUB

119. LiStr: Linguistic Structure Induction Tookit

Creator:: Mareček, David and Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: parsing, unsupervised machine learning, machine translation, and grammar induction
Language:: English
Description:: This toolkit comprises the tools and supporting scripts for unsupervised induction of dependency trees from raw texts or texts with already assigned part-of-speech tags. There are also scripts for simple machine translation based on unsupervised parsing and scripts for minimally supervised parsing into Universal-Dependencies style.
Rights:: GNU General Public Licence, version 3, http://opensource.org/licenses/GPL-3.0, and PUB

120. LongEval Click-Model Relevance Judgements (Qrels)

Creator:: Galuščáková, Petra, Devaud, Romain, Gonzalez-Saez, Gabriela, Mulhem, Philippe, Goeuriot, Lorraine, Piroi, Florina, and Popel, Martin
Publisher:: Université Grenoble Alpes
Type:: text and corpus
Subject:: information retrieval, automatic evaluation, and evaluation
Language:: French and English
Description:: The collection comprises the relevance judgments used in the 2023 LongEval Information Retrieval Lab (https://clef-longeval.github.io/), organized at CLEF. It consists of three sets of relevance judgments: 1) Relevance judgments for the heldout queries from the LongEval Train Collection (http://hdl.handle.net/11234/1-5010). 2) Relevance judgments for the short-term persistence (sub-task A) queries from the LongEval Test Collection (http://hdl.handle.net/11234/1-5139). 3) Relevance judgments for the long-term persistence (sub-task B) queries from the LongEval Test Collection (http://hdl.handle.net/11234/1-5139). These judgments were provided by the Qwant search engine (https://www.qwant.com) and were generated using a click model. The click model output was based on the clicks of Qwant's users, but it mitigates noise from raw user clicks caused by positional bias and also better safeguards users' privacy. Consequently, it can serve as a reliable soft relevance estimate for evaluating and training models. The collection includes a total of 1,420 judgments for the heldout queries, with 74 considered highly relevant and 326 deemed relevant. For the short-term sub-task queries, there are 12,217 judgments, including 762 highly relevant and 2,608 relevant ones. As for the long-term sub-task queries, there are 13,467 judgments, with 936 being highly relevant and 2,899 relevant.
Rights:: Qwant LongEval Attribution-NonCommercial-ShareAlike License, https://lindat.mff.cuni.cz/repository/xmlui/page/Qwant_LongEval_BY-NC-SA_License, and PUB

111. Khresmoi Query Translation Test Data 1.0

112. Khresmoi Query Translation Test Data 2.0

113. Khresmoi Summary Translation Test Data 1.1

114. Khresmoi Summary Translation Test Data 2.0

115. KonText Web Demo

116. Korektor 2

117. Large-Scale Colloquial Persian 0.5

118. Lingua::Interset 2.026

119. LiStr: Linguistic Structure Induction Tookit

120. LongEval Click-Model Relevance Judgements (Qrels)

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from