Many studies in cognitive linguistics have analysed the semantics of 'over', notably the
semantics associated with 'over' as a preposition. Most of them generally conclude that 'over' is
polysemic and this polysemy is to be described thanks to a semantic radial network, showing
the relationships between the different meanings of the word. What we would like to suggest
on the contrary is that the meanings of 'over' are highly dependent on the utterance context in
which its occurrences are embedded, and consequently that the meaning of 'over' itself is
under-specified, rather than polysemic. Moreover, to provide a more accurate account of the
apparent wide range of meanings of 'over' in context, we ought to take into account the other
uses of this unit: as an adverb and particle, and not only as a preposition. In this paper, we
provide a corpus-based description of 'over' which leads us to propose a monosemic definition. ,So as to achiev such a description, we used a short dataset of randomly selected 326 sentences containing 'over' in various positions in the sentences and corresponding to various categories.
We present the Czech Court Decisions Dataset (CCDD) -- a dataset of 300 manually annotated court decisions published by The Supreme Court of the Czech Republic and the Constitutional Court of the Czech Republic.
The Dialogy.Org system allows users to search in transcribed audio-visual corpora. The Dialogy.Org works on the principle of web-based interface, so installation of additional programs on your computer is not necessary. You must have Flash Player for playing audio or video recordings. and This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu).
The texts are sentences from the Europarl parallel corpus (Koehn, 2005). We selected the monolingual sentences from parallel corpora for the following pairs: Bulgarian-English, Czech-English, Portuguese-English and Spanish-English. The English corpus is comprised by the English side of the Spanish-English corpus.
Basque is not in Europarl. In addition, it contains the Basque and English sides of the GNOME corpus.
The texts have been automatically annotated with NLP tools, including Word Sense Disambiguation, Named Entity Disambiguation and Coreference resolution. Please check deliverable D5.6 in http://qtleap.eu/deliverables for more information.
An interactive web demo for querying selected ÚFAL and LINDAT corpora. LINDAT/CLARIN KonText is a fork of ÚČNK KonText (https://github.com/czcorpus/kontext, maintained by Tomáš Machálek) that contains some modifications and additional features. Kontext, in turn, is a fork of the Bonito 2.68 python web interface to the corpus management tool Manatee (http://nlp.fi.muni.cz/trac/noske, created by Pavel Rychlý).
One of the goals of LINDAT/CLARIN Centre for Language Research Infrastructure is to provide technical background to institutions or researchers who wants to share their tools and data used for research in linguistics or related research fields. The digital repository is built on a highly customised DSpace platform. and LM2010013 - FULLY SUPPORTED BY THE MINISTRY OF EDUCATION, SPORTS AND YOUTH OF THE CZECH REPUBLIC
One of the goals of LINDAT/CLARIN Centre for Language Research Infrastructure is to provide technical background to institutions or researchers who wants to share their tools and data used for research in linguistics or related research fields. The digital repository is built on a highly customised DSpace platform. and LM2010013 - FULLY SUPPORTED BY THE MINISTRY OF EDUCATION, SPORTS AND YOUTH OF THE CZECH REPUBLIC
An LMF conformant XML-based file containing all Arabic characters (letters, vowels and punctuations). Each character described with a description, different displays (isolated, at the beginning, middle and the end of a word), a codification (Unicode, others could be added later), and two transliterations (Buckwalter and wiki).
An LMF conformant XML-based file containing the electronic version of al logha al arabia al moassira (Contemporary Arabic) dictionary. An Arabic monolingual dictionary accomplished by Ahmed Mukhtar Abdul Hamid Omar (deceased: 1424) with the help of a working group