Angabe von Rechtschreibung, Bedeutungsübersicht, Synonymen, Aussprache (Audio-Datei), Herkunft, Grammatik, typischen Verbindungen (computergeneriert) sowie Bedeutungen, Beispielen und Wendungen (zusätzlich: Angabe der Wörter, die im Alphabet vorhergehen und nachfolgen)
setzt sich zusammen aus dem Deutschen Wörterbuch, dem Wörterbuch der Deutschen Gegenwartssprache (WDG) sowie dem Etymologischen Wörterbuch des Deutschen (EtymWb)
Database of three inter-related early Irish glossaries. The texts, compiled from the eighth century, comprise several thousand headwords followed by entries that can range from single word explanations to whole narratives running to several pages.
EDBL (Lexical DataBase for Basque) is the lexical basis needed for the automatic treatment of Basque. It is made up of about 120.000 entries divided into dictionary entries (the same you can find in a conventional dictionay), verb forms and dependent morphemes, all of them with their respective morphological information.
Schwerpunkt: Bedeutungs-/Verwendungsbeschreibung; zusätzlich: Angabe von Orthographie, Worttrennung und grammatischen Informationen; befindet sich noch im Aufbau
Data collection has been done by the means of Sketch Engine program.
Data were extrapolated from the annotated English web corpus enTenTen20.
Data collection and analysis has been done during the period of two months: April and May 2023.
Recently, the enTenTen20 corpus has been updated to a newer version - enTenTen21. Nevertheless, the older version is still available, can be worked on and can be compared with the newer one. It has been noticed that the differences between the two versions of the English web corpus did not affect the results of this study. The only apparent difference was seen in slightly different numbers in frequency values for specific collocations. This was expected since the older version of web corpus consists of 36 billion words, while the new version counts 52 billion words. On the other hand, as noted above, these frequency deviations were not significant enough to refute the hypotheses. They have rather confirmed them once again.
This study is one of the results of work on a larger scientific-research project called "Metaphorical collocations - syntagmatic relations between semantics and pragmatics". More information about the project is available on the following link: https://metakol.uniri.hr/en/opis-projekta/
The study has been financed by the Croatian science foundation.
Working with the data/replicating the study:
Data collected for the purposes of this study is available in CSV format.
Data for each gustatory adjective (collocate) is presented in a separate CSV file.
Upon opening each file, stretch the borders of every column for better visibility of data.
Tables show different collocational bases (nouns) which are found in the corpus, in combination with a specific gustatory adjective, their collocate.
These nouns are listed by their score number (The Mutual Information score expresses the extent to which words co-occur compared to the number of times they appear separately).
Tables show what type of mapping is present in a certain collocation (e.g., intra-modal or cross-modal).
Tables show what type of meaning or cognitive process is working in the background of the meaning formation (e.g., metonymic or metaphoric).
For every analyzed collocation, we provided a contextualized example of its use from the corpus, along with the hyperlink where it can be found.
EngVallex is the English counterpart of the PDT-Vallex valency lexicon, using the same view of valency, valency frames and the description of a surface form of verbal arguments. EngVallex contains links also to PropBank and Verbnet, two existing English predicate-argument lexicons used, i.a., for the PropBank project. The EngVallex lexicon is fully linked to the English side of the PCEDT parallel treebank, which is in fact the PTB re-annotated using the Prague Dependency Treebank style of annotation. The EngVallex is available in an XML format in our repository, and also in a searchable form with examples from the PCEDT.