Harvested from: LINDAT/CLARIAH-CZ repository - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Harvested from LINDAT/CLARIAH-CZ repository Date 2000 to 2023

31. CoNLL-based Extended Czech Named Entity Corpus 1.0

Creator:: Konkol, Michal, Konopík, Miloslav, Ševčíková, Magda, Žabokrtský, Zdeněk, and Straková, Jana
Publisher:: University of West Bohemia
Type:: text and corpus
Subject:: named entity recognition, Czech, and conll
Language:: Czech
Description:: This is a Czech Named Entity Corpus 1.0 transformed into the CoNLL format. The original corpus can be downloaded from: http://hdl.handle.net/11858/00-097C-0000-0023-1B04-C. The CoNLL transformation is described in this publication: https://link.springer.com/chapter/10.1007/978-3-642-40585-3_20.
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

32. CoNLL-based Extended Czech Named Entity Corpus 2.0

Creator:: Konkol, Michal, Konopík, Miloslav, Ševčíková, Magda, Žabokrtský, Zdeněk, Straková, Jana, and Straka, Milan
Publisher:: University of West Bohemia
Type:: text and corpus
Subject:: named entity recognition and Czech
Language:: Czech
Description:: This is a Czech Named Entity Corpus 2.0 transformed into the CoNLL format. The original corpus can be downloaded from: http://hdl.handle.net/11858/00-097C-0000-0023-1B22-8. The CoNLL transformation is described in this publication: https://link.springer.com/chapter/10.1007/978-3-642-40585-3_20.
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

33. Contemporary Arabic dictionary

Creator:: Namly, Driss
Publisher:: Ibtikarat Team
Type:: text, lexicon, and lexicalConceptualResource
Subject:: lexical semantics
Language:: Arabic
Description:: An XML-based file containing the electronic version of al logha al arabia al moassira (Contemporary Arabic) dictionary. An Arabic monolingual dictionary accomplished by Ahmed Mukhtar Abdul Hamid Omar (deceased: 1424) with the help of a working group
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

34. Copenhagen Dependency Treebanks versions 1-3

Publisher:: Copenhagen Business School
Format:: application/octet-stream
Type:: corpus
Subject:: parallel treebank, POS annotation, discourse annotation, morphological annotation, syntactic annotation, and semantic annotation
Language:: Danish, English, German, Italian, and Spanish
Description:: Parallel treebanks with annotation of syntax, discourse, coreference, morphology, and semantics. Version 3 also includes the Danish Dependency Treebank (version 1) and the Danish-English Parallel Dependency Treebank (version 2).
Rights:: GNU General Public License

35. Corpus "Miljons"

Publisher:: Institute of Mathematics and Computer Science, University of Latvia
Format:: text/plain
Type:: corpus
Subject:: balanced corpus
Language:: Latvian
Description:: Balanced corpus of Modern Latvian (~ 1 million running words, currently in plain-text), publicly available via Bonito interface
Rights:: Not specified

36. Corpus "Plāns ledus"

Publisher:: Institute of Mathematics and Computer Science, University of Latvia
Format:: text/sgml
Type:: corpus
Language:: Latvian
Description:: Morphologically tagged and lemmatized text sample (> 16 000 running words), publicly available via Bonito interface and http://www.korpuss.lv/uzzinas/plans_ledus.pdf
Rights:: Not specified

37. Corpus de parlants catalanoparlants de La Canonja en temps real (TR)

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: corpus
Subject:: oral corpus
Language:: Catalan
Description:: Oral corpus containing 10 sociolinguistic interviews carried out in La Canonja (Tarragona).
Rights:: Not specified

38. Corpus Nederlandse Gebarentaal (CNGT)

Publisher:: Radboud University Nijmegen
Type:: corpus
Subject:: Linguistics and language technology
Description:: The Corpus NGT is a collection of data from deaf signers using Sign Language of the Netherlands (NGT). The data consist of recordings with multiple synchronised video cameras, accompanied by gloss and translation annotations.
Rights:: Creative Commons BY-NC-SA 3.0 NL license and http://creativecommons.org/licenses/by-nc-sa/3.0/nl/

39. Corpus of contemporary blogs

Creator:: Grác, Marek
Publisher:: Masaryk University, NLP Centre
Type:: text and corpus
Subject:: corpus, blogs, annotation, annotators, sentences, and machine learning
Language:: Czech
Description:: In NLP Centre, dividing text into sentences is currently done with a tool which uses rule-based system. In order to make enough training data for machine learning, annotators manually split the corpus of contemporary text CBB.blog (1 million tokens) into sentences. Each file contains one hundredth of the whole corpus and all data were processed in parallel by two annotators. The corpus was created from ten contemporary blogs: hintzu.otaku.cz modnipeklo.cz bloc.cz aleneprokopova.blogspot.com blog.aktualne.cz fuchsova.blog.onaidnes.cz havlik.blog.idnes.cz blog.aktualne.centrum.cz klusak.blogspot.cz myego.cz/welldone
Rights:: Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0), http://creativecommons.org/licenses/by-nc-nd/3.0/, and PUB

40. Corpus OVER

Creator:: Col, Gilles
Publisher:: Université de Poitiers
Type:: text and corpus
Subject:: over, semantics, instruction, and corpus-data
Language:: English
Description:: Many studies in cognitive linguistics have analysed the semantics of 'over', notably the semantics associated with 'over' as a preposition. Most of them generally conclude that 'over' is polysemic and this polysemy is to be described thanks to a semantic radial network, showing the relationships between the different meanings of the word. What we would like to suggest on the contrary is that the meanings of 'over' are highly dependent on the utterance context in which its occurrences are embedded, and consequently that the meaning of 'over' itself is under-specified, rather than polysemic. Moreover, to provide a more accurate account of the apparent wide range of meanings of 'over' in context, we ought to take into account the other uses of this unit: as an adverb and particle, and not only as a preposition. In this paper, we provide a corpus-based description of 'over' which leads us to propose a monosemic definition. ,So as to achiev such a description, we used a short dataset of randomly selected 326 sentences containing 'over' in various positions in the sentences and corresponding to various categories.
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

« Previous
Next »
1
2
3
4
5
6
7
8
…
17
18

31. CoNLL-based Extended Czech Named Entity Corpus 1.0

32. CoNLL-based Extended Czech Named Entity Corpus 2.0

33. Contemporary Arabic dictionary

34. Copenhagen Dependency Treebanks versions 1-3

35. Corpus "Miljons"

36. Corpus "Plāns ledus"

37. Corpus de parlants catalanoparlants de La Canonja en temps real (TR)

38. Corpus Nederlandse Gebarentaal (CNGT)

39. Corpus of contemporary blogs

40. Corpus OVER

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from