MTMonkey is a web service which handles and distributes JSON-encoded HTTP requests for machine translation (MT) among multiple machines running an MT system, including text pre- and post processing.
It consists of an application server and remote workers which handle text processing and communicate translation requests to MT systems. The communication between the application server and the workers is based on the XML-RPC protocol. and The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 257528 (KHRESMOI). This work has been using language resources developed and/or stored and/or distributed by the LINDAT-Clarin project of the Ministry of Education of the Czech Republic (project LM2010013). This work has been supported by the AMALACH grant (DF12P01OVV02) of the Ministry of Culture of the Czech Republic.
The THEaiTRobot 1.0 tool allows the user to interactively generate scripts for individual theatre play scenes.
The tool is based on GPT-2 XL generative language model, using the model without any fine-tuning, as we found that with a prompt formatted as a part of a theatre play script, the model usually generates continuation that retains the format.
We encountered numerous problems when generating the script in this way. We managed to tackle some of the problems with various adjustments, but some of them remain to be solved in a future version.
THEaiTRobot 1.0 was used to generate the first THEaiTRE play, "AI: Když robot píše hru" ("AI: When a robot writes a play").
The THEaiTRobot 2.0 tool allows the user to interactively generate scripts for individual theatre play scenes.
The previous version of the tool (http://hdl.handle.net/11234/1-3507) was based on GPT-2 XL generative language model, using the model without any fine-tuning, as we found that with a prompt formatted as a part of a theatre play script, the model usually generates continuation that retains the format.
The current version also uses vanilla GPT-2 by default, but can also instead use a GPT-2 medium model fine-tuned on theatre play scripts (as well as film and TV series scripts). Apart from the basic "flat" generation using a theatrical starting prompt and the script model, the tool also features a second, hierarchical variant, where in the first step, a play synopsis is generated from its title using a synopsis model (GPT-2 medium fine-tuned on synopses of theatre plays, as well as film, TV series and book synopses). The synopsis is then used as input for the second stage, which uses the script model.
The choice of models to use is done by setting the MODEL variable in start_server.sh and start_syn_server.sh
THEaiTRobot 2.0 was used to generate the second THEaiTRE play, "Permeation/Prostoupení".
Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems. It ships in three parts: Czech data, English data, and scripts.
The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions. The scripts implement data pre-processing and building acoustic models using the HTK and Kaldi toolkits.
This is the scripts part of the dataset. and This research was funded by the Ministry of
Education, Youth and Sports of the Czech Republic under the grant agreement
LK11221.