The corpus contains pronunciation lexicon and n-gram counts (unigrams, bigrams and trigrams) that can be used for constructing the language model for air traffic control communication domain. It could be used together with the Air Traffic Control Communication corpus (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0). and Technology Agency of the Czech Republic, project No. TA01030476
Phonological neighborhood density is known to influence lexical access, speech production as well as perception processes. Lexical competition is thought to be the central concept from which the neighborhood effect emanates: highly competitive neighborhoods are characterized by large degrees of phonemic co-activation, which can delay speech recognition and facilitate speech production. The present study investigates phonetic learning in English as a foreign language in relation to phonological neighborhood density and onset density to see whether dense or sparse neighborhoods are more conducive to the incorporation of novel phonetic detail. In addition, the effect of voice-contrasted minimal pairs (bat-pat) is explored. Results indicate that sparser neighborhoods with weaker lexical competition provide the most optimal phonological environment for phonetic learning. Moreover, novel phonetic details are incorporated faster in neighborhoods without minimal pairs. Results indicate that lexical competition plays a role in the dissemination of phonetic updates in the lexicon of foreign language learners.
The file contains the charts, tables and figures serving to delineate the metaphor-metonymy cognitive mechanism behind English denominal verbs. The data was obtained by questionnaires and interviews, which was then documented into charts and tables. Figures submitted mainly provide clear outline and concise outline of the metaphor-metonymy models of denominalization.
We have created test set for syntactic questions presented in the paper [1] which is more general than Mikolov's [2]. Since we were interested in morphosyntactic relations, we extended only the questions of the syntactic type with exception of nationality adjectives which is already covered completely in Mikolov's test set.
We constructed the pairs more or less manually, taking inspiration in the Czech side of the CzEng corpus [3], where explicit morphological annotation allows to identify various pairs of Czech words (different grades of adjectives, words and their negations, etc.). The word-aligned English words often shared the same properties. Another sources of pairs were acquired from various webpages usually written for learners of English. For example for verb tense, we relied on a freely available list of English verbs and their morphological variations.
We have included 100-1000 different pairs for each question set. The questions were constructed from the pairs similarly as by Mikolov: generating all possible pairs of pairs. This leads to millions of questions, so we randomly selected 1000 instances per question set, to keep the test set in the same order of magnitude. Additionally, we decided to extend set of questions on opposites to cover not only opposites of adjectives but also of nouns and verbs.
The book [1] contains spelling rules classified into ten categories, each category containing many rules. This XML file presents our implemented rules classified with six category tags, as is the case in the book. We implemented 24 rules since the remaining rules require diacritical and morphological analysis that are outside the scope of our present work.
References:
[1] Dr.Fahmy Al-Najjar, 'Spelling rules in ten easy lessons', Al Kawthar Library,2008. Available: https://www.alukah.net/library/0/53498/%D9%82%D9%88%D8%A7%D8%B9%D8%AF-%D8%A7%D9%84%D8%A5%D9%85%D9%84%D8%A7%D8%A1-%D9%81%D9%8A-%D8%B9%D8%B4%D8%B1%D8%A9-%D8%AF%D8%B1%D9%88%D8%B3-%D8%B3%D9%87%D9%84%D8%A9-pdf/
General Information:
Data collector: Jean Costa Silva (University of Georgia)
Date of collection: September-December 2022
Manner of collection: Online questionnaire via Qualtrics
Funding: No
Dataset collected from natural dialogs which enables to test the ability of dialog systems to interactively learn new facts from user utterances throughout the dialog. The dataset, consisting of 1900 dialogs, allows simulation of an interactive gaining of denotations and questions explanations from users which can be used for the interactive learning.
VPS-GradeUp is a collection of triple manual annotations of 29 English verbs based on the Pattern Dictionary of English Verbs (PDEV) and comprising the following lemmas: abolish, act, adjust, advance, answer, approve, bid, cancel, conceive, cultivate, cure, distinguish, embrace, execute, hire, last, manage, murder, need, pack, plan, point, praise, prescribe, sail, seal, see, talk, urge . It contains results from two different tasks:
1. Graded decisions
2. Best-fit pattern (WSD) .
In both tasks, the annotators were matching verb senses defined by the PDEV patterns with 50 actual uses of each verb (using concordances from the BNC [2]). The verbs were randomly selected from a list of completed PDEV lemmas with at least 3 patterns and at least 100 BNC concordances not previously annotated by PDEV’s own annotators. Also, the selection excluded verbs contained in VPS-30-En[3], a data set we developed earlier. This data set was built within the project Reviving Zellig S. Harris: more linguistic information for distributional lexical analysis of English and Czech and in connection with the SemEval-2015 CPA-related task.