Transcripts of longitudinal audio recordings of 7 Czech typical monolingual children between 1;7 to 3;9. Files are in plain text with UTF-8 encoding. Each file represents one recording session of one of the target children and is named with the presudonym of the child and her age at the given session in form YMMDD. Transcription rules and other details are to find on the homepage coczefla.ff.cuni.cz.
A new version of the previously published corpus Chroma. The version 2023.04 includes six children. Two transcripts (Julie20221, Klara30424) were removed since they did not meet the criteria on the dialogical format. The transcripts were revised (eliminating typing errors and inconsistencies in the transcription format) and morphologically annotated by the automatic tool MorphoDiTa. Detailed manual control of the annotation was performed on children's utterances; the annotation of adult data was not checked yet. Files are in plain text with UTF-8 encoding. Each file represents one recording session of one of the target children and is named with the alias of the child and their age at the given session in form YMMDD. Transcription rules and other details can be found on the homepage coczefla.ff.cuni.cz.
A new version of the previously published corpus Chroma wih morphological annotation. The version 2023.07 differs from 2023.04 in that it includes all seven children and it went through an additional careful check of consistency and conformity to the CHAT transcription principles.
Two transcripts (Julie20221, Klara30424) from the previous versions (2022.07, 2019.07) were removed since they did not meet our criteria on dialogical format. All transcripts of recordings made during one day were split into one file. Thus, version 2023.07 consists of 183 files/transcripts. The number of utterances and tokens given here in LINDAT corresponds to children's lines only.
Files are in plain text with UTF-8 encoding. Each file represents one recording session of one of the target children and is named with the alias of the child and their age at the given session in form YMMDD. Transcription rules and other details can be found on the homepage coczefla.ff.cuni.cz.
Segment of the Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) 1942 No. 41 captures the funeral of Alexandr Commichau, head of the Reich Labour Service, held on 5 October 1942 in the Spanish Hall of Prague Castle, decorated with Nazi emblems for the occassion. The deceased´s military honours are on display next to a bier with the coffin. Mourners include the widow, a little girl and State Secretary Hermann Frank. Acting Reich Protector Kurt Daluege lays down a wreath from Adolf Hitler. The funeral speech is delivered by Deputy Chief General of RAD Wilhelm Decker (silent). The procession with the coffin, which will be transported to a crematorium in Prague, moves through the Matthias Gate.
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 49AB from 1943 captures the concert called Five Years Leading the Nation, which was organised by the Board of Trustees for the Education of Youth to mark the fifth anniversary of Emil Hácha´s presidency, and held at Smetana Hall in the Municipal House in Prague on 29 November. General Secretary of the Board František Teuner gave a speech at the formal event. The programme included a selection of folk songs by Otakar Jeremiáš performed by the Czech Choir and the Kühn Children´s Choir accompanied by the Czech Philharmonic under the baton of Karel Šejna.
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 42A from 1944 was shot during a concert organised by the Board of Trustees for the Education of Youth and held in the Smetana Hall of the Municipal House in Prague on 3 October. The concert was dedicated to mark the 70th birth anniversary of the late composer Josef Suk. The programme, prepared by the Czech Philharmonic led by conductor Otakar Pařík, included the symphonic poem "Praga".
Czech data - both train and test+eval sets, as well as the valency dictionary - for the CoNLL 2009 Shared Task. Documentation is included. The data are generated from PDT 2.0. LDC catalog number: LDC2009E34B and MSM 0021620838 (http://ufal.mff.cuni.cz:8080/bib/?section=grant&id=116488695895567&mode=view)
Czech trial (example) data for CoNLL 2009 Shared Task. The data are generated from PDT 2.0. LDC2009E32B and MSM 0021620838 (http://ufal.mff.cuni.cz:8080/bib/?section=grant&id=116488695895567&mode=view)
CoNLL 2017 and 2018 shared tasks:
Multilingual Parsing from Raw Text to Universal Dependencies
This package contains the test data in the form in which they ware presented
to the participating systems: raw text files and files preprocessed by UDPipe.
The metadata.json files contain lists of files to process and to output;
README files in the respective folders describe the syntax of metadata.json.
For full training, development and gold standard test data, see
Universal Dependencies 2.0 (CoNLL 2017)
Universal Dependencies 2.2 (CoNLL 2018)
See the download links at http://universaldependencies.org/.
For more information on the shared tasks, see
http://universaldependencies.org/conll17/
http://universaldependencies.org/conll18/
Contents:
conll17-ud-test-2017-05-09 ... CoNLL 2017 test data
conll18-ud-test-2018-05-06 ... CoNLL 2018 test data
conll18-ud-test-2018-05-06-for-conll17 ... CoNLL 2018 test data with metadata
and filenames modified so that it is digestible by the 2017 systems.