A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
The dataset contains 4731 frozen continuous Czech multiword expressions. Inflectional word forms are generated for those MWEs where applicable. In total, the dataset contains 24,807 MWE forms.
Titles of courses possibly relevant to the Digital Humanities for 2017-2018, manually gathered from course catalogues of most Czech state colleges, including the names of the teachers, department and school names, and the school-unique course IDs. All this information was publicly available in the individual course catalogues accessed from the official websites of the individual colleges.
The file contains the charts, tables and figures serving to delineate the metaphor-metonymy cognitive mechanism behind English denominal verbs. The data was obtained by questionnaires and interviews, which was then documented into charts and tables. Figures submitted mainly provide clear outline and concise outline of the metaphor-metonymy models of denominalization.