Harvested from: LINDAT/CLARIAH-CZ repository - LINDAT/CLARIAH-CZ Catalog Search Results

741. PAWS

Creator:: Nedoluzhko, Anna, Novák, Michal, and Ogrodniczuk, Maciej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: multilingual, parallel corpus, coreference, and tectogrammatics
Language:: English, Czech, Russian, and Polish
Description:: PAWS is a multi-lingual parallel treebank with coreference annotation. It consists of English texts from the Wall Street Journal translated into Czech, Russian and Polish. In addition, the texts are syntactically parsed and word-aligned. PAWS is based on PCEDT 2.0 and continues the tradition of multilingual treebanks with coreference annotation. PAWS offers linguistic material that can be further leveraged in cross-lingual studies, especially on coreference.
Rights:: PAWS License, https://lindat.mff.cuni.cz/repository/xmlui/page/license-PAWS, and RES

742. pdftotext

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Format conversion service: .pdf to .txt converter
Rights:: Not specified

743. PDT-Vallex: Czech Valency lexicon linked to treebanks

Creator:: Urešová, Zdeňka, Štěpánek, Jan, Hajič, Jan, Panevova, Jarmila, and Mikulová, Marie
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: annotation, corpora, data, lexicon, semantics, valency, and PDT
Language:: Czech
Description:: The valency lexicon PDT-Vallex has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT). It contains over 11000 valency frames for more than 7000 verbs which occurred in the PDT or PCEDT. It is available in electronically processable format (XML) together with the aforementioned treebanks (to be viewed and edited by TrEd, the PDT/PCEDT main annotation tool), and also in more human readable form including corpus examples (see the WEBSITE link below). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

744. PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 (PDT-Vallex 4.0)

Creator:: Urešová, Zdeňka, Bémová, Alevtina, Fučíková, Eva, Hajič, Jan, Kolářová, Veronika, Mikulová, Marie, Pajas, Petr, Panevová, Jarmila, and Štěpánek, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, computationalLexicon, and lexicalConceptualResource
Subject:: verbal valency, valency, annotation, linguistic data, lexicon, lexical semantics, and PDT
Language:: Czech
Description:: The valency lexicon PDT-Vallex 4.0 has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT, the spoken language corpus (PDTSC) and corpus of user-generated texts in the project Faust). It contains over 14500 valency frames for almost 8500 verbs which occurred in the PDT, PCEDT, PDTSC and Faust corpora. In addition, there are nouns, adjectives and adverbs, linked from the PDT part only, increasing the total to over 17000 valency frames for 13000 words. All the corpora have been published in 2020 as the PDT-C 1.0 corpus with the PDT-Vallex 4.0 dictionary included; this is a copy of the dictionary published as a separate item for those not interested in the corpora themselves. It is available in electronically processable format (XML), and also in more human readable form including corpus examples (see the WEBSITE link below, and the links to its main publications elsewhere in this metadata). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives. It replaces the previously published unversioned edition of PDT-Vallex from 2014.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

745. People of the Center corpus

Type:: corpus
Description:: Documentation of the People of the Center project (DoBeS project)
Rights:: Code of conduct

746. Persian Morphologically Segmented Lexicon 0.5

Creator:: Ansari, Ebrahim, Žabokrtský, Zdeněk, Haghdoost, Hamid, and Nikravesh, Mahshid
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: morphological analysis, and lemmatization
Language:: Persian
Description:: This dataset includes 45300 Persian word forms which are manually segmented into sequences of morphemes.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

747. PhilologEg

Publisher:: University of St. Andrews
Type:: toolService
Description:: a tool for representing and exchanging electronic resources on texts in the Ancient Egyptian language.
Rights:: Not specified

748. Phonetic Corpus of Estonian Spontaneous Speech (online search engine)

Publisher:: University of Tartu
Type:: corpus
Subject:: speech corpus
Language:: Estonian
Description:: Studio recordings of spontaneous Estonian segmented phonetically on word, sound, and other linguistic levels. Current size about 22 hours of speech, 155 000 words. Online search engine lets you search from word-level segments and returns matching 2 second sequences of sound and segmentation.
Rights:: Not specified

749. Pierer's Universal-Lexikon

Type:: lexicalConceptualResource
Subject:: Germanistik
Language:: German
Description:: 4. Aufl. 1857-1865; wortgenaue Seitenkonkordanz zu der gedruckten Ausgabe; laut dem im Untertitel angegebenen Eigenanspruch ein "enzyklopädisches Wörterbuch"
Rights:: Not specified

750. Plain-Moses-Chimera

Creator:: Bojar, Ondřej and Tamchyna, Aleš
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and suiteOfTools
Subject:: moses and machine translation
Language:: English and Czech
Description:: Statistical component of Chimera, a state-of-the-art MT system. and Project DF12P01OVV022 of the Ministry of Culture of the Czech Republic (NAKI -- Amalach).
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

741. PAWS

742. pdftotext

743. PDT-Vallex: Czech Valency lexicon linked to treebanks

744. PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 (PDT-Vallex 4.0)

745. People of the Center corpus

746. Persian Morphologically Segmented Lexicon 0.5

747. PhilologEg

748. Phonetic Corpus of Estonian Spontaneous Speech (online search engine)

749. Pierer's Universal-Lexikon

750. Plain-Moses-Chimera

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Original context has metadata only

Harvested from