The SynSemClass Search Tool provides a web search tool for the SynSemClass 5.0 ontology. It includes several search options and criteria for building complex queries. The search results are rendered in a clear and user-friendly interactive representation.
The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language resources, i.e., relating verb meanings with respect to contextually-based verb synonymy. The lexicon entries are linked to PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F), EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2), CzEngVallex (http://hdl.handle.net/11234/1-1512), FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), VerbNet (https://uvi.colorado.edu/ and http://verbs.colorado.edu/verbnet/index.html), PropBank (http://propbank.github.io/), Ontonotes (http://clear.colorado.edu/compsem/index.php?page=lexicalresources&sub=ontonotes), and English Wordnet (https://wordnet.princeton.edu/).
The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language resources, i.e., relating verb meanings with respect to contextually-based verb synonymy. The lexicon entries are linked to PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F), EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2), CzEngVallex (http://hdl.handle.net/11234/1-1512), FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), VerbNet (https://uvi.colorado.edu/ and http://verbs.colorado.edu/verbnet/index.html), PropBank (http://propbank.github.io/), Ontonotes (http://clear.colorado.edu/compsem/index.php?page=lexicalresources&sub=ontonotes), and English Wordnet (https://wordnet.princeton.edu/).
CzEng is a sentence-parallel Czech-English corpus compiled at the Institute of Formal and Applied Linguistics (ÚFAL). While the full CzEng 2.0 is freely available for non-commercial research purposes from the project website (https://ufal.mff.cuni.cz/czeng), this release contains only the original monolingual parts of news text (csmono 53M and enmono 79M sentences) with automatic (synthetic) translations by CUBBITT.
See the attached README for additional details such as the file format.
The THEaiTRobot 1.0 tool allows the user to interactively generate scripts for individual theatre play scenes.
The tool is based on GPT-2 XL generative language model, using the model without any fine-tuning, as we found that with a prompt formatted as a part of a theatre play script, the model usually generates continuation that retains the format.
We encountered numerous problems when generating the script in this way. We managed to tackle some of the problems with various adjustments, but some of them remain to be solved in a future version.
THEaiTRobot 1.0 was used to generate the first THEaiTRE play, "AI: Když robot píše hru" ("AI: When a robot writes a play").
The THEaiTRobot 2.0 tool allows the user to interactively generate scripts for individual theatre play scenes.
The previous version of the tool (http://hdl.handle.net/11234/1-3507) was based on GPT-2 XL generative language model, using the model without any fine-tuning, as we found that with a prompt formatted as a part of a theatre play script, the model usually generates continuation that retains the format.
The current version also uses vanilla GPT-2 by default, but can also instead use a GPT-2 medium model fine-tuned on theatre play scripts (as well as film and TV series scripts). Apart from the basic "flat" generation using a theatrical starting prompt and the script model, the tool also features a second, hierarchical variant, where in the first step, a play synopsis is generated from its title using a synopsis model (GPT-2 medium fine-tuned on synopses of theatre plays, as well as film, TV series and book synopses). The synopsis is then used as input for the second stage, which uses the script model.
The choice of models to use is done by setting the MODEL variable in start_server.sh and start_syn_server.sh
THEaiTRobot 2.0 was used to generate the second THEaiTRE play, "Permeation/Prostoupení".
AMALACH project component TMODS:ENG-CZE; machine translation of queries from Czech to English. This archive contains models for the Moses decoder (binarized, pruned to allow for real-time translation) and configuration files for the MTMonkey toolkit. The aim of this package is to provide a full service for Czech->English translation which can be easily utilized as a component in a larger software solution. (The required tools are freely available and an installation guide is included in the package.)
The translation models were trained on CzEng 1.0 corpus and Europarl. Monolingual data for LM estimation additionally contains WMT news crawls until 2013.
The datasets described in Droganova, Kira, and Daniel Zeman. "Towards a Unified Taxonomy of Deep Syntactic Relations." Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024.
Four languages are included in this release. English PropBank is omitted due to its license terms.
Treex::Web is a web frontend for running Treex applications from your browser.
Treex (formerly TectoMT) is a highly modular NLP framework implemented in Perl programming language. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project.
Pretrained model weights for the UDify model, and extracted BERT weights in pytorch-transformers format. Note that these weights slightly differ from those used in the paper.