Harvested from: LINDAT/CLARIAH-CZ repository / Original context has metadata only: false

Start Over Original context has metadata only false Harvested from LINDAT/CLARIAH-CZ repository

1071. Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)

Creator:: Guillaume, Bruno, Ramisch, Carlos, Waszczuk, Jakub, Monti, Johanna, Di Buono, Maria Pia, Sangati, Federico, Speranza, Giulia, Carlino, Carola, Güngör, Tunga, Yirmibeşoğlu, Zeynep, Sak, Haşim, Saraçlar, Murat, Giouli, Voula, Foufi, Vassiliki, Ramisch, Renata, Rademaker, Alexandre, Vale, Oto, Wilkens, Rodrigo, Candito, Marie, Crabbé, Benoît, Segonne, Vincent, Liebeskind, Chaya, Stymne, Sara, Hajič, Jan, Ginter, Filip, Luotolahti, Juhani, Straka, Milan, Zeman, Daniel, Barbu Mititelu, Verginica, Cristescu, Mihaela, Vaidya, Ashwini, Bhatia, Archna, Lichte, Timm, Ehren, Rafael, Jiang, Menghan, Xu, Hongzhi, Walsh, Abigail, Irimia, Elena, and Dowling, Meghan
Publisher:: PARSEME
Type:: text and corpus
Subject:: morphosyntactic annotation, dependency trees, and morphological analysis
Language:: German, Modern Greek (1453-), Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Portuguese, Romanian, Swedish, Turkish, and Chinese
Description:: This multilingual resource contains corpora for 14 languages, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). These corpora were meant to serve as additional "raw" corpora, to help discovering unseen verbal MWEs. The corpora are provided in CONLL-U (https://universaldependencies.org/format.html) format. They contain morphosyntactic annotations (parts of speech, lemmas, morphological features, and syntactic dependencies). Depending on the language, the information comes from treebanks (mostly Universal Dependencies v2.x) or from automatic parsers trained on UD v2.x treebanks (e.g., UDPipe). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2
Rights:: PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2-raw, and PUB

1072. Morphological Analyzer for Shipibo-Konibo

Creator:: Cardenas Acosta, Ronald and Zeman, Daniel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: morphological analyzer and finite state transducer
Language:: Shipibo-Conibo
Description:: This tool is the first morphological analyzer ever for this language. The analyzer is a FST that produces all possible segmentations and tagging sequences in a word-by-word fashion.
Rights:: GNU General Public Licence, version 3, http://opensource.org/licenses/GPL-3.0, and PUB

1073. Motion Encoding Lexicalization Patterns: Portuguese and English Learners

Creator:: Costa-Silva, Jean
Publisher:: University of Georgia
Type:: text, other, and lexicalConceptualResource
Subject:: motion encoding, language acquisition, Portuguese, English, and lexicalization patterns
Language:: English
Description:: General Information: Data collector: Jean Costa Silva (University of Georgia) Date of collection: September-December 2022 Manner of collection: Online questionnaire via Qualtrics Funding: No
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

1074. MSTperl delexicalized parser transfer scripts and configuration files

Creator:: Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: suiteOfTools and toolService
Subject:: parsing and delexicalized parser transfer
Description:: This is a set of MSTperl parser configuration files and scripts for delexicalized parser transfer. They were used in the work reported in arXiv:1506.04897 (http://arxiv.org/abs/1506.04897), as well as several related papers. The MSTperl parser is available at http://hdl.handle.net/11234/1-1480
Rights:: GNU General Public Licence, version 3, http://opensource.org/licenses/GPL-3.0, and PUB

1075. MSTperl parser

Creator:: Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and tool
Subject:: parser, NLP, Treex, parsing, and dependency
Language:: Czech and English
Description:: MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html). MST parser (Maximum Spanning Tree parser) is a state-of-the-art natural language dependency parser -- a tool that takes a sentence and returns its dependency tree. In MSTperl, only some functionality was implemented; the limitations include the following: the parser is a non-projective one, curently with no possibility of enforcing the requirement of projectivity of the parse trees; only first-order features are supported, i.e. no second-order or third-order features are possible; the implementation of MIRA is that of a single-best MIRA, with a closed-form update instead of using quadratic programming. On the other hand, the parser supports several advanced features: parallel features, i.e. enriching the parser input with word-aligned sentence in other language; adding large-scale information, i.e. the feature set enriched with features corresponding to pointwise mutual information of word pairs in a large corpus (CzEng). The MSTperl parser is tuned for parsing Czech. Trained models are available for Czech, English and German. We can train the parser for other languages on demand, or you can train it yourself -- the guidelines are part of the documentation. The parser, together with detailed documentation, is avalable on CPAN (http://search.cpan.org/~rur/Treex-Parser-MSTperl/). and The research has been supported by the EU Seventh Framework Programme under grant agreement 247762 (Faust), and by the grants GAUK116310 and GA201/09/H057.
Rights:: Artistic License 2.0, http://opensource.org/licenses/Artistic-2.0, and PUB

1076. MSTperl parser (2015-05-19)

Creator:: Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and tool
Subject:: parser, NLP, Treex, parsing, and dependency
Language:: Czech and English
Description:: MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html). MST parser (Maximum Spanning Tree parser) is a state-of-the-art natural language dependency parser -- a tool that takes a sentence and returns its dependency tree. In MSTperl, only some functionality was implemented; the limitations include the following: the parser is a non-projective one, curently with no possibility of enforcing the requirement of projectivity of the parse trees; only first-order features are supported, i.e. no second-order or third-order features are possible; the implementation of MIRA is that of a single-best MIRA, with a closed-form update instead of using quadratic programming. On the other hand, the parser supports several advanced features: parallel features, i.e. enriching the parser input with word-aligned sentence in other language; adding large-scale information, i.e. the feature set enriched with features corresponding to pointwise mutual information of word pairs in a large corpus (CzEng); weighted/unweighted parser model interpolation; combination of several instances of the MSTperl parser (through MST algorithm); combination of several existing parses from any parsers (through MST algorithm). The MSTperl parser is tuned for parsing Czech. Trained models are available for Czech, English and German. We can train the parser for other languages on demand, or you can train it yourself -- the guidelines are part of the documentation. The parser, together with detailed documentation, is avalable on CPAN (http://search.cpan.org/~rur/Treex-Parser-MSTperl/). and The research has been supported by the EU Seventh Framework Programme under grant agreement 247762 (Faust), and by the grants GAUK116310 and GA201/09/H057.
Rights:: Artistic License 2.0, http://opensource.org/licenses/Artistic-2.0, and PUB

1077. MTMonkey

Creator:: Tamchyna, Aleš, Dušek, Ondřej, and Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and infrastructure
Subject:: machine translation, distributed computing, web service, and infrastructure
Description:: MTMonkey is a web service which handles and distributes JSON-encoded HTTP requests for machine translation (MT) among multiple machines running an MT system, including text pre- and post processing. It consists of an application server and remote workers which handle text processing and communicate translation requests to MT systems. The communication between the application server and the workers is based on the XML-RPC protocol. and The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 257528 (KHRESMOI). This work has been using language resources developed and/or stored and/or distributed by the LINDAT-Clarin project of the Ministry of Education of the Czech Republic (project LM2010013). This work has been supported by the AMALACH grant (DF12P01OVV02) of the Ministry of Culture of the Czech Republic.
Rights:: Apache License 2.0, http://opensource.org/licenses/Apache-2.0, and PUB

1078. MUDr. Šimsa's Neurorehabilitation Clinic in Krč

Creator:: Kinofa and Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: sanatorium nervové, kočár tažený koňmi, znak Kinofa, bazén otevřený, úbory koupací, adiktologie, Galerie osobností, Places::Praha::Krč::Sanatorium Dr. Šimsy, People::Šimsa Jan (1865-1945), and Zdravotní a sociální péče
Language:: No linguistic content
Description:: The segment shows the neurorehabilitation clinic of neurologist and addiction treatment pioneer Jan Šimsa, which he ran from 1901 to 1916 in Prague's Krč district. The central building, called Vita Nova, was designed and built in 1909 by architect Bohuslav Černý. Caught on camera are the arriving guests, patients exercising outdoors, and female patients swimming in the outdoor pool.
Rights:: Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB

1079. Multilingual corpus of literal occurrences of multiword expressions

Creator:: Savary, Agata, Cordeiro, Silvio Ricardo, Lichte, Timm, Ramisch, Carlos, Iñurrieta, Uxoa, and Giouli, Voula
Publisher:: PARSEME
Type:: text and corpus
Subject:: verbal multiword expressions, literal occurrence, and idiomaticity rate
Language:: Basque, German, Modern Greek (1453-), Polish, and Portuguese
Description:: The corpus contains sentences with idiomatic, literal and coincidental occurrences of verbal multiword expressions (VMWEs) in Basque, German, Greek, Polish and Portuguese. The source corpus is the PARSEME multilingual corpus of VMWEs v 1.1 (cf. http://hdl.handle.net/11372/LRT-2842). The sentences with VMWEs were extracted from the source corpus and potential co-occurrences of the same lexemes were automatically extracted from the same corpus. These candidates were then manually annotated by native experts into 6 classes, including literal and coincidental occurrences, as well as various annotation errors. The construction of the corpus is described by the following publication: Agata Savary, Silvio Ricardo Cordeiro, Timm Lichte, Carlos Ramisch, Uxoa Iñurrieta, Voula Giouli (forthcoming) "Literal occurrences of multiword expressions: Rare birds that cause a stir", to appear in Prague Bulletin of Mathematical Linguistics.
Rights:: License agreement for The Multilingual corpus of literal occurrences of multiword expressions, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-literal, and PUB

1080. Multilingual static embeddings for Verbal Multiword Expressions trained on PARSEME raw corpora

Creator:: Estève, Louis Clément, Savary, Agata, and Lavergne, Thomas
Publisher:: Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique
Type:: text, computationalLexicon, and lexicalConceptualResource
Subject:: verbal multiword expressions, word embeddings, and word2vec
Language:: German, Modern Greek (1453-), Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Portuguese, Romanian, Swedish, Turkish, and Chinese
Description:: This resource is a set of 14 vector spaces for single words and Verbal Multiword Expressions (VMWEs) in different languages (German, Greek, Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Brazilian Portuguese, Romanian, Swedish, Turkish, Chinese). They were trained with the Word2Vec algorithm, in its skip-gram version, on PARSEME raw corpora automatically annotated for morpho-syntax (http://hdl.handle.net/11234/1-3367). These corpora were annotated by Seen2Seen, a rule-based VMWE identifier, one of the leading tools of the PARSEME shared task version 1.2. VMWE tokens were merged into single tokens. The format of the vector space files is that of the original Word2Vec implementation by Mikolov et al. (2013), i.e. a binary format. For compression, bzip2 was used.
Rights:: PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2-raw, and PUB

« Previous
Next »
1
2
…
104
105
106
107
108
109
110
111
112
…
159
160

1071. Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)

1072. Morphological Analyzer for Shipibo-Konibo

1073. Motion Encoding Lexicalization Patterns: Portuguese and English Learners

1074. MSTperl delexicalized parser transfer scripts and configuration files

1075. MSTperl parser

1076. MSTperl parser (2015-05-19)

1077. MTMonkey

1078. MUDr. Šimsa's Neurorehabilitation Clinic in Krč

1079. Multilingual corpus of literal occurrences of multiword expressions

1080. Multilingual static embeddings for Verbal Multiword Expressions trained on PARSEME raw corpora

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Creator

Show values starting with

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from