Harvested from: LINDAT/CLARIAH-CZ repository / Original context has metadata only: false / Type: text

41. Arabic Phonetic Rules

Creator:: Mustafa, Ebtihal and Bouzoubaa, Karim
Publisher:: languages journal
Type:: text, other, and lexicalConceptualResource
Subject:: Arabic and phonetic rules
Language:: Arabic
Description:: Description: this xml file describes the Arabic phonetic constraints (rules) resulting from the analysis of the lexicons(Taj Alarous, Al ain, Lisan Al arab, Alwassit and almoassir ). These rules are to be applied to Arabic roots and are classified into a number of categories. Each category has a certain type of constraints as follow: The first category defines that the root must not consist of three identical letters. The second category defines that the root must not start with two repeating letters. The third category lists the letters that must not occur in the same root, regardless of their order. The fourth category lists the letters that may not be used together in a certain order in a root. ISLRN: 190-535-098-473-3
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

42. Arabic Proclitics Lexicon

Creator:: Loukili, Taoufik
Publisher:: Ibtikarat team
Type:: text, lexicon, and lexicalConceptualResource
Subject:: proclitics
Language:: Arabic
Description:: An XML-based file containing all Arabic proclitics
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

43. Arabic Special verbs Lexicon

Creator:: Namly, Driss
Publisher:: Ibtikarat team
Type:: text, lexicon, and lexicalConceptualResource
Subject:: particles
Language:: Arabic
Description:: An XML-based file containing Arabic Stop-words respecting verbs syntax
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

44. Arabic Triliteral roots Lexicon

Creator:: Mustafa, Ebtihal and Bouzoubaa, Karim
Publisher:: MDPI language jurnal
Type:: text, lexicon, and lexicalConceptualResource
Subject:: Arabic language, Arabic roots, lexicons, phonetic system, bigram frequencies, and roots weight.
Language:: Arabic
Description:: Description: This xml file is a lexicon containing all 21952 (28x28x28) Arabic triliteral combinations (roots). the file is split into three parts as follow: the first part contains the phonetic constraints that must be taken into account in the formation of Arabic roots (for more details see all_phonetic_rules.xml in http://arabic.emi.ac.ma/alelm/?q=Resources). the second part contains the lexicons that were used to create this lexicon (see in lexicons tag). the third part contains the roots. ISLRN: 813-907-570-946-2
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

45. Arabic WordNet ontology

Creator:: Abouenour, Lahcen, Bouzoubaa, Karim, and Rosso, Paulo
Publisher:: Wordnet
Type:: text, wordnet, and lexicalConceptualResource
Subject:: WordNet
Language:: Arabic
Description:: This improved version is an extension of the original Arabic Wordnet (http://globalwordnet.org/arabic-wordnet/awn-browser/), it was enriched by new verbs, nouns including the broken plurals that is a specific form for Arabic words.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

46. Artificial Treebank with Ellipsis

Creator:: Droganova, Kira, Zeman, Daniel, Kanerva, Jenna, and Ginter, Filip
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: universal dependencies, ellipsis, and gapping
Language:: English, Czech, Finnish, Russian, and Slovak
Description:: Artificially created treebank of elliptical constructions (gapping), in the annotation style of Universal Dependencies. Data taken from UD 2.1 release, and from large web corpora parsed by two parsers. Input data are filtered, sentences are identified where gapping could be applied, then those sentences are transformed, one or more words are omitted, resulting in a sentence with gapping. Details in Droganova et al.: Parse Me if You Can: Artificial Treebanks for Parsing Experiments on Elliptical Constructions, LREC 2018, Miyazaki, Japan.
Rights:: Licence Universal Dependencies v2.1, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.1, and PUB

47. Aspect-Term Annotated Customer Reviews in Czech

Creator:: Fiala, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: sentiment analysis, opinion target, and customer review
Language:: Czech
Description:: This dataset contains a number of user product reviews which are publicly available on the website of an established Czech online shop with electronic devices. Each review consists of negative and positive aspects of the product. This setting pushes the customer to rate important characteristics. We have selected 2000 positive and negative segments from these reviews and manually tagged their targets. Additionally, we selected 200 of the longest reviews and annotated them in the same way. The targets were either aspects of the evaluated product or some general attributes (e.g. price, ease of use).
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

48. ATCC: Pronunciation lexicon and n-gram counts for ASR module

Creator:: Šmídl, Luboš
Publisher:: University of West Bohemia, Department of Cybernetics
Type:: text, lexicalConceptualResource, and other
Subject:: pronunciation lexicon, n-gram counts, and language model
Language:: English
Description:: The corpus contains pronunciation lexicon and n-gram counts (unigrams, bigrams and trigrams) that can be used for constructing the language model for air traffic control communication domain. It could be used together with the Air Traffic Control Communication corpus (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0). and Technology Agency of the Czech Republic, project No. TA01030476
Rights:: Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0), http://creativecommons.org/licenses/by-nc/3.0/, and PUB

49. Automatic Paraphrases of Czech Reference Sentences for WMT11, 13 and 14

Creator:: Barančíková, Petra and Tamchyna, Aleš
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: machine translation, automatic evaluation, and paraphrasing
Language:: Czech
Description:: This dataset contains automatic paraphrases of Czech official reference translations for the Workshop on Statistical Machine Translation shared task. The data covers the years 2011, 2013 and 2014. For each sentence, at most 10000 paraphrases were included (randomly selected from the full set). The goal of using this dataset is to improve automatic evaluation of machine translation outputs. If you use this work, please cite the following paper: Tamchyna Aleš, Barančíková Petra: Automatic and Manual Paraphrases for MT Evaluation. In proceedings of LREC, 2016.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

50. Automatically generated spelling correction corpus for Czech (Czech-SEC-AG)

Creator:: Hajič, Jan, Náplava, Jakub, and Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: spelling correction and natural language correction
Language:: Czech
Description:: Automatically generated spelling correction corpus for Czech (Czesl-SEC-AG) is a corpus containg text with automatically generated spelling errors. To create spelling errors, a character error model containing probabilities of character substitution, insertion, deletion and probabilities of swaping two adjacent characters is used. Besides these probabilities, also the probabilities of changing character casing are considered. The original clean text on which the spelling errors were generated is PDT3.0 (http://hdl.handle.net/11858/00-097C-0000-0023-1AAF-3). The original train/dev/test sentence split of PDT3.0 corpus is preserved in this dataset. Besides the data with artificial spelling errors, we also publish texts from which the character error model was created. These are the original manual transcript of an audiobook Švejk and its corrected version performed by authors of Korektor (http://ufal.mff.cuni.cz/korektor). These data are similarly to CzeSL Grammatical Error Correction Dataset (CzeSL-GEC: http://hdl.handle.net/11234/1-2143) processed into four sets based on error difficulty present.
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

41. Arabic Phonetic Rules

42. Arabic Proclitics Lexicon

43. Arabic Special verbs Lexicon

44. Arabic Triliteral roots Lexicon

45. Arabic WordNet ontology

46. Artificial Treebank with Ellipsis

47. Aspect-Term Annotated Customer Reviews in Czech

48. ATCC: Pronunciation lexicon and n-gram counts for ASR module

49. Automatic Paraphrases of Czech Reference Sentences for WMT11, 13 and 14

50. Automatically generated spelling correction corpus for Czech (Czech-SEC-AG)

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from