Segment from Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel) 1942, issue no. 43, shows the renaming ceremony of a part of the Vltava waterfront in Prague (today the Smetana Quay) to Reinhard-Heydrich-Ufer on Sunday, 18 September 1942. The ceremony is attended by widow Lina Heydrich, Acting Reich Protector Kurt Daluege with his entourage, Reich Secretary Karl Hermann Frank, Deputy Mayor of the City of Prague Josef Pfitzner, Prime Minister of the Protectorate Government Jaroslav Krejčí, and Minister of the Interior Richard Bienert. The National Theatre is seen in the background. The ceremony is opened by Deputy Mayor of the City of Prague Josef Pfitzner. In his speech, Karl Hermann Frank highlights Reinhard Heydrich´s political legacy and the duties arising from it for Germans and Czechs (silent). After that, Frank begins the renaming ceremony and the unveiling of a plaque with a sign reading "Reinhard-Heydrich-Ufer." The participants perform a Nazi salute to the tune of the German anthem.
Restaurant Reviews CZ ABSA - 2.15k reviews with their related target and category
The work done is described in the paper: https://doi.org/10.13053/CyS-20-3-2469
The data contains the morphemic dictionary scanned in the PDF format. It is divided into 3 parts:
introductions.pdf - pp. 11-102
main_dictionary.pdf - pp. 113-506
appendices.pdf - pp. 509-645
The file contains all Czech verbs included in the Retrograde Morphemic Dictionary of Czech Language (Slavíčková Eleonora, Academia 1975).
The data was obtained by scanning a portion of the dictionary that contains words ending in -ci and -ti. Among them, there were 18 non-verbs, which were removed. Using OCR, the data was converted into the plain text format and the result was checked by two independent readers. However, if a user encounters a forgotten error, please report.
RobeCzech is a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-theart results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base, both for PyTorch and TensorFlow.
ROMi represents a specific subcorpus of CZESL (Czech as a Second Language). It collects examples of language use, both spoken and written, of Czech Romani children and teen-agers. The range of materials exceeds 1,5 million words.
Language Material
The material presents uses of spoken language by language-specific group of Romani speakers using Czech as their first language. However, this form of the language is specifically different from Czech as used by the Czech-speaking majority, both on the spoken and secondarily on the written level. It concerns the so-called Romani ethnolect of Czech, i.e. a variety of Czech used by Romani communities mainly in the Czech Republic. We may detect obvious influence of Romani, Slovak and Hungarian. Furthermore, many of the recorded speakers live in social exclusion and thus their language production is influenced by both factors, i.e. by Romani ethnolect and social exclusion.
The language material was collected in the years 2009 – 2012 under the Education for Competitiveness Operational Programme, within the framework of the project Innovations of Czech as a Second Language Education collaboratively by the Technical University of Liberec and the Institute of Czech Language and Theory of Communication, Faculty of Arts, Charles University. The language material was processed with support of Institute of Formal and Applied Linguistics - project LINDAT-Clarin.
It concerns 110 recordings obtained in various environments – the collection of material took place both in schools and also in several non-profit organizations offering leisure time activities to Romani students. Apart from the school setting, the recordings thus come from the environment of extracurricular activities, sport matches and households. Both the respondents and the collectors are Romani. The samples were acquired in all regions of the Czech Republic, although the majority of recordings were obtained in the Central Bohemia, South Bohemia, Ústí and Vysočina Region. The age of the respondents ranges from 12 to 28 years. The collected samples are also accompanied by metadata relating to the following areas:
The collected samples are accompanied by metadata relating to the following areas:
• The place of origin (the place of collection, the size of the residence and dialect area, region, environment (school, extracurricular, private); socially excluded locality.
• The circumstances of the collection expressing the extent of control exercised by the collector (topic assigned/non-assigned).
• The respondent (the age of the student; class/year; sex; type of the school; subjective knowledge of Romani; first language – the one the student considers to be his first; communicative environment in the family – which language(s) is/are used for communication in the family.
• The place of data collection – in the case of schools metadata comprise characteristics of the type of school (primary, for students with special needs, remedial, vocational, secondary), the founder (state, church, private organisation), in the case of the place of individual collection of data you may find organisation, interest group markings, etc.
• The collector (the abbreviation of collector´s name and his work area, in some cases also his age).
Delimiting the group of respondents
The respondents are constituted by students of primary schools, schools for students with special needs, secondary schools and by teenagers who have just completed the compulsory education. For the purposes of the language material collection, those students who consider themselves to be Romani or who are considered Romani by others were included to the sample. Moreover, a language criterion was added to this definition - thus those students in whose families Romani is spoken at home were also included. Active knowledge of the Romani language was not required since hardly a third of Romani children living in the Czech Republic nowadays is competent in this language.
Ethical aspects of the data collection and processing
As regards the content of the language material, it places demands on the data processing from the ethical point of view. Frequently, the texts and recordings feature highly interesting material; the respondents talk about their life stories fully distant or inconceivable for the social majority. During the transcription process, all materials are anonymized and identification data are removed.
Field Research
When dealing with the environment threatened by social exclusion, it is highly important to consider especially the needs and opportunities of the group members as well as the needs of those individuals, who find themselves or work in such an environment. During the developmental process of the corpus, we became decidedly convinced that it is necessary to accommodate different demands on material quality of texts and recordings and not to overburden both the respondents and the collectors with limiting or impossible requirements. Therefore, the corpus comprises several recordings of lower technical quality which were acquired in the presence of other persons, with the television turned on, etc. Firstly, the recordings would not even have come into existence under different circumstances – it is natural that the interviewing of younger children was taking place directly in their households, in the presence of their parents. Secondly, the recordings would have been made, yet they would have been influenced by the unnaturalness of the situation, consequently affecting the language material. Apart from the interviews with younger children, it regards especially those conversations between the collectros and their peers, e.g. inside leisure time clubs.
Characteristics of the recordings
The collected recordings come both from the school environment (especially conversations of teacher assistants with individual students) and from the leisure time facilities (interest groups, after-school tutoring). In most cases it concerns conversations of the collector and the individual, alternatively a pair of respondents. The length of the recordings differs, although the majority ranges from 20 to 35 minutes. A single recording approximately contains 2 495 words. The quality of recordings is influenced by the limits of field-utilizable technologies and the effort to increase authenticity to the maximum.
Transcription of the recordings
The rules for transcription of the recordings are based on similar ones designed for SCHOLA corpus. Transcriptions are carried out by the means of folkloristic transcription, i.e. the closest to the written record, especially adapted for the purposes of computational processing, following the practice established in the Czech National Corpus. The transcription is performed with the help of the Transcriber programme, which connects the sound and graphic track.
The segment of Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel), 1937, issue no. 50 captures a speech delivered by Rudolf Beran, a member of the National Assembly, on 9 December 1937 regarding the future of Czechoslovakia.
Actress Růžena Nasková with her colleague Karel Hašler in Ahasver (Ahasuerus, dir. Jaroslav Kvapil, 1915). Nasková in a theatre dressing room in costume as Magdalena Dobromila Rettigová in a fragmented newsreel segment from Československý přehled (Czechoslovak Review) 1947, issue no. 36. Nasková with her husband, painter František Xaver Naske, on Myslíkova Street. Nasková accepts well wishes from young Pioneers on the occasion of her 70th birthday in a fragmented segment from Československý filmový týdeník (Czechoslovak Film Weekly Newsreel) 1954, issue no. 49.
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 41B from 1942 captures the Sailing Promotion Day event organised by the Board of Trustees for the Education of Youth, in collaboration with the Czech Yacht Association and the Czech Yacht Club, which was held on the Vltava River in Prague-Podolí on 27 September.