This corpus consists of full transcriptions of both Democratic and Republican 2016 presidential candidate debates, with a special focus on the idiolects of Hillary Clinton and Donald Trump against the background of the speeches of other candidates for the post of president of the United States.
The transcriptions are sourced from the American Presidency Project at the University of California, Santa Barbara. Any use of the material requires a prior and explicit written permission by the project administrator (contact policy@ucsb.edu). This corpus material is now being shared with their kindly permission.
This editor was developed especially for the needs of the KAMOKO project (https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-3261). The editor allows the quick entry of example sentences and sentence variants as well as the corresponding speaker ratings.
KAMOKO is a structured and commented french learner-corpus. It addresses the central structures of the French language from a linguistic perspective (18 different courses). The text examples in this corpus are annotated by native speakers. This makes this corpus a valuable resource for (1) advanced language practice/teaching and (2) linguistics research.
The KAMOKO corpus can be used free of charge. Information on the structure of the corpus and instructions on how to use it are presented in detail in the KAMOKO Handbook and a video-tutorial (both in german). In addition to the raw XML-data, we also offer various export formats (see ZIP files – supported file formats: CorpusExplorer, TXM, WebLicht, TreeTagger, CoNLL, SPEEDy, CorpusWorkbench and TXT).
KAMOKO is a structured and commented french learner-corpus. It addresses the central structures of the French language from a linguistic perspective (18 different courses). The text examples in this corpus are annotated by native speakers. This makes this corpus a valuable resource for (1) advanced language practice/teaching and (2) linguistics research.
The KAMOKO corpus can be used free of charge. Information on the structure of the corpus and instructions on how to use it are presented in detail in the KAMOKO Handbook and a video-tutorial (both in german). In addition to the raw XML-data, we also offer various export formats (see ZIP files – supported file formats: CorpusExplorer, TXM, WebLicht, TreeTagger, CoNLL, SPEEDy, CorpusWorkbench and TXT).
This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from general public and medical experts. and This work was supported by the EU FP7 project Khresmoi (European Comission contract No. 257528). The language resources are distributed by the LINDAT/Clarin project of the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2010013).
We thank Health on the Net Foundation for granting the license for the English general public queries, TRIP database for granting the license for the English medical expert queries, and three anonymous translators and three medical experts for translating amd revising the data.
This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans Swedish. The queries come from general public and medical experts. This is version 2.0 extending the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German. and This work was supported by the EU FP7 project Khresmoi (European Comission contract No. 257528). The language resources are distributed by the LINDAT/Clarin project of the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2010013). We thank all the data providers and copyright holders for providing the source data and anonymous experts for translating the sentences.
This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech, English, French, German, Hungarian, Polish, Spanish
and Swedish. Version 2.0 extends the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
An interactive web demo for querying selected ÚFAL and LINDAT corpora. LINDAT/CLARIN KonText is a fork of ÚČNK KonText (https://github.com/czcorpus/kontext, maintained by Tomáš Machálek) that contains some modifications and additional features. Kontext, in turn, is a fork of the Bonito 2.68 python web interface to the corpus management tool Manatee (http://nlp.fi.muni.cz/trac/noske, created by Pavel Rychlý).
"Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in five different languages (EN, CS, DE, IT, HI).