Harvested from: LINDAT/CLARIAH-CZ repository / Language: English - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Language English Harvested from LINDAT/CLARIAH-CZ repository

51. Coreference in Universal Dependencies 1.2 (CorefUD 1.2)

Creator:: Popel, Martin, Novák, Michal, Žabokrtský, Zdeněk, Zeman, Daniel, Nedoluzhko, Anna, Acar, Kutay, Bamman, David, Bourgonje, Peter, Cinková, Silvie, Eckhoff, Hanne, Cebiroğlu Eryiğit, Gülşen, Hajič, Jan, Hardmeier, Christian, Haug, Dag, Jørgensen, Tollef, Kåsen, Andre, Krielke, Pauline, Landragin, Frédéric, Lapshinova-Koltunski, Ekaterina, Mæhlum, Petter, Martí, M. Antònia, Mikulová, Marie, Nøklestad, Anders, Ogrodniczuk, Maciej, Øvrelid, Lilja, Pamay Arslan, Tuğba, Recasens, Marta, Solberg, Per Erik, Stede, Manfred, Straka, Milan, Swanson, Daniel, Toldova, Svetlana, Vadász, Noémi, Velldal, Erik, Vincze, Veronika, Zeldes, Amir, and Žitkus, Voldemaras
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: coreference, bridging relations, harmonized annotation, dependency, and treebank
Language:: Ancient Greek (to 1453), Ancient Hebrew, Catalan, Czech, English, French, German, Hungarian, Lithuanian, Norwegian, Church Slavic, Polish, Russian, Spanish, and Turkish
Description:: CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 1.2 consists of 25 datasets for 16 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 21 datasets for 15 languages (1 dataset for Ancient Greek, 1 for Ancient Hebrew, 1 for Catalan, 2 for Czech, 3 for English, 1 for French, 2 for German, 2 for Hungarian, 1 for Lithuanian, 2 for Norwegian, 1 for Old Church Slavonic, 1 for Polish, 1 for Russian, 1 for Spanish, and 1 for Turkish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource, too. Compared to the previous version 1.1, the version 1.2 comprises new languages and corpora, namely Ancient_Greek-PROIEL, Ancient_Hebrew-PTNK, English-LitBank, and Old_Church_Slavonic-PROIEL. In addition, English-GUM and Turkish-ITCC have been updated to newer versions, conversion of zeros in Polish-PCC has been improved, and the conversion pipelines for multiple other datasets have been refined (a list of all changes in each dataset can be found in the corresponding README file).
Rights:: Licence CorefUD v1.2, https://lindat.mff.cuni.cz/repository/xmlui/page/license-corefud-1.2, and PUB

52. CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: coreference resolution, CorPipe, and CorefUD
Language:: Catalan, Czech, German, English, Spanish, French, Hungarian, Lithuanian, Norwegian Bokmål, Norwegian Nynorsk, Polish, Russian, and Turkish
Description:: The `corpipe23-corefud1.1-231206` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 23 (https://github.com/ufal/crac2023-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no _corpus id_ on input), so it can be used to predict coreference in any `mT5` language (for zero-shot evaluation, see the paper). However, note that the empty nodes must be present already on input, they are not predicted (the same settings as in the CRAC23 shared task).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

53. Corpus bilingüe d’alternança de llengües (codeswitching)

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: corpus
Subject:: speech corpus
Language:: Catalan, English, and Spanish
Description:: 8 interactive recordings of group dynamics. Bilingual speakers (L1 -> English; L1 -> Catalan/Spanish).
Rights:: Not specified

54. Corpus CLUVI

Publisher:: TALG Research Group (University of Vigo)
Type:: corpus
Language:: Basque, Catalan, English, French, Galician, German, Portuguese, and Spanish
Description:: Parallel corpus, 22 million words
Rights:: Not specified

55. Corpus d’extractes de gravacions d’Internet en temps aparent (TA) i temps real (TR) amb finalitats forenses

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: corpus
Subject:: corpus
Language:: English
Rights:: Not specified

56. Corpus de narratives d’angloparlants immigrats a Espanya en temps aparent (TA)

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: corpus
Language:: English
Description:: Oral corpus containing 166 narratives in English elicited by means of Labovian techniques. Participants from the UK (England, Wales, Scotland), Ireland, USA, Australia and South Africa.
Rights:: Not specified

57. Corpus of Early English Correspondence Sampler (CEECS)

Publisher:: University of Helsinki
Format:: text/plain
Type:: corpus
Language:: English
Description:: Personal correspondence from England between the years 1418-1680. Compiled as a tool for historical sociolinguistics.
Rights:: Not specified

58. Corpus OVER

Creator:: Col, Gilles
Publisher:: Université de Poitiers
Type:: text and corpus
Subject:: over, semantics, instruction, and corpus-data
Language:: English
Description:: Many studies in cognitive linguistics have analysed the semantics of 'over', notably the semantics associated with 'over' as a preposition. Most of them generally conclude that 'over' is polysemic and this polysemy is to be described thanks to a semantic radial network, showing the relationships between the different meanings of the word. What we would like to suggest on the contrary is that the meanings of 'over' are highly dependent on the utterance context in which its occurrences are embedded, and consequently that the meaning of 'over' itself is under-specified, rather than polysemic. Moreover, to provide a more accurate account of the apparent wide range of meanings of 'over' in context, we ought to take into account the other uses of this unit: as an adverb and particle, and not only as a preposition. In this paper, we provide a corpus-based description of 'over' which leads us to propose a monosemic definition. ,So as to achiev such a description, we used a short dataset of randomly selected 326 sentences containing 'over' in various positions in the sentences and corresponding to various categories.
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

59. Corpus Tècnic de l'IULA

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: corpus
Language:: Catalan, English, and Spanish
Description:: domain specific corpus (Law, Economy, Computing, Medicine and Environment as well as a contrastive corpus from the press); EN 3.3 M tokens, SP 33 M tokens, CAT 19 M tokens; EAGLEs pos tagset
Rights:: Not specified

60. CorpusExplorer

Creator:: Rüdiger, Jan Oliver
Publisher:: Jan Oliver Rüdiger
Type:: tool and toolService
Subject:: Corpus Linguisitics, NLP, conll, tei, XML, nlp, Natural Language Processing, linguistics, Linguistics, Computational Linguistics, corpus processing, tagger, POS tagger, lemmatization, text cleaning, CommonCrawl, epub, JSON, Twitter, Pandoc, Wikipedia, digital data, DTA, DSpin, MySQL, ElasticSearch, TextGrid, text corpora, TigerXML, and WeblichtXML
Language:: German, English, French, Italian, Dutch, Spanish, Polish, Arabic, Chinese, and Portuguese
Description:: Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks such as text acquisition, cleaning or tagging are completely automated. The simple interface supports the use in university teaching and leads users/students to fast and substantial results. The CorpusExplorer is open for many standards (XML, CSV, JSON, R, etc.) and also offers its own software development kit (SDK). Source code available at https://github.com/notesjor/corpusexplorer2.0
Rights:: Not specified

« Previous
Next »
1
2
3
4
5
6
7
8
9
10
…
36
37

51. Coreference in Universal Dependencies 1.2 (CorefUD 1.2)

52. CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

53. Corpus bilingüe d’alternança de llengües (codeswitching)

54. Corpus CLUVI

55. Corpus d’extractes de gravacions d’Internet en temps aparent (TA) i temps real (TR) amb finalitats forenses

56. Corpus de narratives d’angloparlants immigrats a Espanya en temps aparent (TA)

57. Corpus of Early English Correspondence Sampler (CEECS)

58. Corpus OVER

59. Corpus Tècnic de l'IULA

60. CorpusExplorer

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from