Dne 9. července 2014 jsme na říčním kilometru 77,88 řeky Moravy prokázali výskyt raka bahenního (Astacus leptodactylus). Pozorováni byli celkem čtyři dospělí jedinci, dvě samice a dva samci. Celková délka těla jedinců se pohybovala v rozmezí 71-75 mm a délka hlavohrudi 34-42 mm. Raci byli nalezeni v kamenité části pravého břehu toku na úseku dlouhém přibližně 5 m. Jde o první nález tohoto druhu v řece Moravě na území České republiky., On 9th of July, 2014, we recorded for the first time the occurrence of Astacus leptodactylus in the River Morava (along 77.88 km of the river). We observed four adults, two females and two males. Total body length of these individuals ranged between 71 and 75 mm, with the carapace 34-42 mm long. Crayfish were found in the rocky part on the right bank of the stream in an area approximately 5 m long., and Lukáš Jurek, Martin Chytrý.
The dataset used for the Ptakopět experiment on outbound machine translation. It consists of screenshots of web forms with user queries entered. The queries are available also in a text form. The dataset comprises two language versions: English and Czech. Whereas the English version has been fully post-processed (screenshots cropped, queries within the screenshots highlighted, dataset split based on its quality etc.), the Czech version is raw as it was collected by the annotators.
Restaurant Reviews CZ ABSA - 2.15k reviews with their related target and category
The work done is described in the paper: https://doi.org/10.13053/CyS-20-3-2469
The data contains the morphemic dictionary scanned in the PDF format. It is divided into 3 parts:
introductions.pdf - pp. 11-102
main_dictionary.pdf - pp. 113-506
appendices.pdf - pp. 509-645
The file contains all Czech verbs included in the Retrograde Morphemic Dictionary of Czech Language (Slavíčková Eleonora, Academia 1975).
The data was obtained by scanning a portion of the dictionary that contains words ending in -ci and -ti. Among them, there were 18 non-verbs, which were removed. Using OCR, the data was converted into the plain text format and the result was checked by two independent readers. However, if a user encounters a forgotten error, please report.
RobeCzech is a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-theart results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base, both for PyTorch and TensorFlow.