« Previous |
41 - 45 of 45
|
Next »
Number of results to display per page
Search Results
42. Semantic Difference Keywords - word2vec embeddings
- Creator:
- Gribomont, Isabelle
- Publisher:
- Cental (UCL)
- Type:
- text, other, and lexicalConceptualResource
- Subject:
- semantic change and lexical semantics
- Language:
- Spanish
- Description:
- Embeddings from word2vec model described in "From Diachronic to Contextual Lexical Semantic Change: Introducing Semantic Difference Keywords (SDKs) for Discourse Studies". Full reference TBC.
- Rights:
- Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB
43. Speech Commands Dataset Enhanced for Direction-of-Arrival Estimation
- Creator:
- Beneš, David
- Publisher:
- University of West Bohemia, Department of Cybernetics
- Type:
- audio and corpus
- Subject:
- speech commands and keyword direction of arrival
- Language:
- English
- Description:
- This dataset can serve as a training and evaluation corpus for the task of training keyword detection with speaker direction estimation (keyword direction of arrival - KWDOA). It was created by processing the existing Speech Commands dataset [1] with the PyroomAcoustics library so that the resulting speech recordings simulate the usage of a circular microphone array with 4 microphones having a distance of 57 mm between adjacent microphones. Such design of a simulated microphone array was chosen in order to match the existing physical microphone array from the Seeeduino series. [1] Warden, Pete. “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition.” ArXiv.org, 2018, arxiv.org/abs/1804.03209
- Rights:
- Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB
44. VPS-GradeUp (2016-10-10)
- Creator:
- Baisa, Vít, Cinková, Silvie, Krejčová, Ema, and Vernerová, Anna
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, other, and lexicalConceptualResource
- Subject:
- Pattern Dictionary of English Verbs, usage patterns, lexical semantics, dictionaries, clustering, Corpus Pattern Analysis, verbs, graded decisions, Likert scale, and Word Sense Disambiguation
- Language:
- English
- Description:
- VPS-GradeUp is a collection of triple manual annotations of 29 English verbs based on the Pattern Dictionary of English Verbs (PDEV) and comprising the following lemmas: abolish, act, adjust, advance, answer, approve, bid, cancel, conceive, cultivate, cure, distinguish, embrace, execute, hire, last, manage, murder, need, pack, plan, point, praise, prescribe, sail, seal, see, talk, urge . It contains results from two different tasks: 1. Graded decisions 2. Best-fit pattern (WSD) . In both tasks, the annotators were matching verb senses defined by the PDEV patterns with 50 actual uses of each verb (using concordances from the BNC [2]). The verbs were randomly selected from a list of completed PDEV lemmas with at least 3 patterns and at least 100 BNC concordances not previously annotated by PDEV’s own annotators. Also, the selection excluded verbs contained in VPS-30-En[3], a data set we developed earlier. This data set was built within the project Reviving Zellig S. Harris: more linguistic information for distributional lexical analysis of English and Czech and in connection with the SemEval-2015 CPA-related task.
- Rights:
- Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB
45. WordSim353-cs: Evaluation Dataset for Lexical Similarity and Relatedness, based on WordSim353
- Creator:
- Cinková, Silvie, Straková, Jana, Hajič, Jakub, Hajič, Jan, Hajič, Jan, jr., Janoušková, Jolana, Straka, Milan, and Urešová, Miroslava
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, wordList, and lexicalConceptualResource
- Subject:
- lexical semantics, similarity, relatedness, evaluation, and distributional semantics
- Language:
- Czech and English
- Description:
- Czech translation of WordSim353. The Czech translation of English WordSim353 word pairs were obtained from four translators. All translation variants were scored according to the lexical similarity/relatedness annotation instructions for WordSim353 annotators, by 25 Czech annotators. The resulting data set consists of two annotation files: "WordSim353-cs.csv" and "WordSim-cs-Multi.csv". Both files are encoded in UTF-8, have a header, text is enclosed in double quotes, and columns are separated by commas. The rows are numbered. The WordSim-cs-Multi data set has rows numbered from 1 to 634, whereas the row indices in the WordSim353-cs data set reflect the corresponding row numbers in the WordSim-cs-Multi data set. The WordSim353-cs file contains a one-to-one mapping selection of 353 Czech equivalent pairs whose judgments have proven to be most similar to the judgments of their corresponding English originals (compared by the absolute value of the difference between the means over all annotators in each language counterpart). In one case ("psychology-cognition"), two Czech equivalent pairs had identical means as well as confidence intervals, so we randomly selected one. The "WordSim-cs-Multi.csv" file contains human judgments for all translation variants. In both data sets, we preserved all 25 individual scores. In the WordSim353-cs data set, we added a column with their Czech means as well as a column containing the original English means and 95% confidence intervals in separate columns for each mean (computed by the CI function in the Rmisc R package). The WordSim-cs-Multi data set contains only the Czech means and confidence intervals. For the most convenient lexical search, we provided separate columns with the respective Czech and English single words, entire word pairs, and eventually an English-Czech quadruple in both data sets. The data set also contains an xls table with the four translations and a preliminary selection of the best variants performed by an adjudicator.
- Rights:
- Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB