Original context has metadata only: false / Subject: morphological analysis

Start Over Subject morphological analysis Original context has metadata only false

11. Korpus ORAL: sestavení, lemmatizace a morfologické značkování

Creator:: Kopřivová, Marie, Komrsková, Zuzana, Lukeš, David, and Poukarová, Petra
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: spoken Czech, spoken language corpora, lemmatization, tagging, morphological analysis, mluvená čeština, korpusy mluveného jazyka, lemmatizace, tagování, and morfologická analýza
Language:: Czech
Description:: The goal of this paper is to provide an overview of the structure and contents of the soon-to-be available ORAL corpus, which combines previously published corpora (ORAL2006, ORAL2008 and ORAL2013) with newly transcribed material into a single conveniently accessible and more richly annotated resource, about 6 million running words in length. The recordings and corresponding transcripts span a decade between 2002 and 2011; most of them capture interactions of mutually well-acquainted speakers, in informal situations and natural settings. The corpus is complemented by amarginal portion of more formal data, mostly public talks. It is tagged and lemmatized, and an effort was made to adapt existing tools (targeted at written language) to yield better results on spoken data. We hope the availability of such a resource will spawn further discussions on the morphological and syntactic analysis of spoken language, perhaps resulting in more radical departures in the future from the part-of-speech classification inherited from the linguistic analysis of written language.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

12. MORFO

Creator:: Kolovratník, David
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService
Subject:: morphological analysis
Language:: Czech
Description:: The MORFO system for morphological analysis of Czech consists of four units: the analyzer, the generator, the dictionary editor, and the library with the shared source code for handling dictionary objects.
Rights:: PDT 2.0 License, https://lindat.mff.cuni.cz/repository/xmlui/page/license-pdt2, and ACA

13. Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)

Creator:: Guillaume, Bruno, Ramisch, Carlos, Waszczuk, Jakub, Monti, Johanna, Di Buono, Maria Pia, Sangati, Federico, Speranza, Giulia, Carlino, Carola, Güngör, Tunga, Yirmibeşoğlu, Zeynep, Sak, Haşim, Saraçlar, Murat, Giouli, Voula, Foufi, Vassiliki, Ramisch, Renata, Rademaker, Alexandre, Vale, Oto, Wilkens, Rodrigo, Candito, Marie, Crabbé, Benoît, Segonne, Vincent, Liebeskind, Chaya, Stymne, Sara, Hajič, Jan, Ginter, Filip, Luotolahti, Juhani, Straka, Milan, Zeman, Daniel, Barbu Mititelu, Verginica, Cristescu, Mihaela, Vaidya, Ashwini, Bhatia, Archna, Lichte, Timm, Ehren, Rafael, Jiang, Menghan, Xu, Hongzhi, Walsh, Abigail, Irimia, Elena, and Dowling, Meghan
Publisher:: PARSEME
Type:: text and corpus
Subject:: morphosyntactic annotation, dependency trees, and morphological analysis
Language:: German, Modern Greek (1453-), Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Portuguese, Romanian, Swedish, Turkish, and Chinese
Description:: This multilingual resource contains corpora for 14 languages, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). These corpora were meant to serve as additional "raw" corpora, to help discovering unseen verbal MWEs. The corpora are provided in CONLL-U (https://universaldependencies.org/format.html) format. They contain morphosyntactic annotations (parts of speech, lemmas, morphological features, and syntactic dependencies). Depending on the language, the information comes from treebanks (mostly Universal Dependencies v2.x) or from automatic parsers trained on UD v2.x treebanks (e.g., UDPipe). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2
Rights:: PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.2-raw, and PUB

14. Open morphology of Finnish

Creator:: Pirinen, Tommi A, Listenmaa, Inari, Johnson, Ryan, Tyers, Francis M., and Kuokkala, Juha
Publisher:: University of Helsinki
Type:: tool and toolService
Subject:: morphological analysis and morphological dictionary
Language:: Finnish
Description:: Omorfi is free and open source project containing various tools and data for handling Finnish texts in a linguistically motivated manner. The main components of this repository are: 1) a lexical database containing hundreds of thousands of words (c.f. lexical statistics), 2) a collection of scripts to convert lexical database into formats used by upstream NLP tools (c.f. lexical processing), 3) an autotools setup to build and install (or package, or deploy): the scripts, the database, and simple APIs / convenience processing tools, and 4) a collection of relatively simple APIs for a selection of languages and scripts to apply the NLP tools and access the database
Rights:: GNU General Public Licence, version 3, http://opensource.org/licenses/GPL-3.0, and PUB

15. Slovak MorphoDiTa Models 170914

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, mlmodel, and languageDescription
Subject:: MorphoDiTa, Slovak, morphological analysis, morphological generation, and PoS tagging
Language:: Slovak
Description:: Slovak models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from MorfFlex SK 170914 and the PoS tagger is trained on automatically translated Prague Dependency Treebank 3.0 (PDT).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

16. Taxonomic revision of the genus Triaenops (Chiroptera: Hipposideridae) with description of a new species from southern Arabia and definitions of a new genus and tribe

Creator:: Benda, Petr and Vallo, Peter
Type:: article and TEXT
Subject:: Triaenops parvus sp. nov., Paratriaenops gen. nov., Triaenopini trib. nov., morphological analysis, genetic analysis, cytochrome b, Middle East, Afrotropics, and Madagascar
Language:: English
Description:: The genus Triaenops has been considered monospecific in its a frican and Middle Eastern range (T. persicus), while three other species have been recognised as endemic to Madagascar (T. menamena, T. furculus, and T. auritus), and another to the western Seychelles (T. pauliani). We analysed representative samples of T. persicus from East Africa and the Middle East using both morphological and molecular genetics approaches and compared them with most of the available type material of species of this genus. Morphological comparisons revealed four distinct morphotypes in the set of examined specimens; one in Africa, the others in the Middle East. The Middle Eastern morphotypes differed mainly in size, while the allopatric African form showed differences in skull shape. Two of three Arabian morphotypes occur in sympatry. Cytochrome b gene-based molecular analysis revealed significant divergences (K2P distance 6.4–8.1% in complete cyt b sequence) among most of the morphotypes. Therefore, we propose a split of the current T. persicus rank into three species: T. afer in Africa, and T. persicus and T. parvus sp. nov. in the Middle east. The results of the molecular analysis also indicated relatively close proximity of the Malagasy T. menamena to Arabian T. persicus, suggesting a northern route of colonisation of Madagascar from populations from the Middle east or north-eastern Africa as a plausible alternative to presumed colonisation from east Africa. Due to a considerable genetic distance (21.6–26.2% in 731 bp sequence of cyt b) and substantial morphological differences from the continental forms of Triaenops as well as from Malagasy T. menamena, we propose generic status (Paratriaenops gen. nov.) for the group of Malagasy species, T. furculus, T. auritus, and T. pauliani. We separated the genera Triaenops and Paratriaenops gen. nov. from other hipposiderid bats into Triaenopini trib. nov. recognising their isolated position within the family Hipposideridae Lydekker, 1891.
Rights:: http://creativecommons.org/licenses/by-nc-sa/4.0/

17. Word representations for multiple languages

Creator:: Müller, Thomas and Schütze, Hinrich
Publisher:: Center for Information and Language Processing, University of Munich
Type:: text and corpus
Subject:: morphological dictionary, morphological analysis, and PoS tagging
Language:: English, German, Latin, Hungarian, Spanish, and Czech
Description:: Dictionaries with different representations for various languages. Representations include brown clusters of different sizes and morphological dictionaries extracted using different morphological analyzers. All representations cover the most frequent 250,000 word types on the Wikipedia version of the respective language. Analzers used: MAGYARLANC (Hungarian, Zsibrita et al. (2013)), FREELING (English and Spanish, Padro and Stanilovsky (2012)), SMOR (German, Schmid et al. (2004)), an MA from Charles University (Czech, Hajic (2001)) and LATMOR (Latin, Springmann et al. (2014)).
Rights:: Creative Commons - Attribution 3.0 Unported (CC BY 3.0), http://creativecommons.org/licenses/by/3.0/, and PUB

11. Korpus ORAL: sestavení, lemmatizace a morfologické značkování

12. MORFO

13. Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)

14. Open morphology of Finnish

15. Slovak MorphoDiTa Models 170914

16. Taxonomic revision of the genus Triaenops (Chiroptera: Hipposideridae) with description of a new species from southern Arabia and definitions of a new genus and tribe

17. Word representations for multiple languages

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from