An XML-based file containing the electronic version of al wassit dictionary. An Arabic monolingual dictionary accomplished by the Academy of the Arabic Language in Cairo
An LMF conformant XML-based file containing the electronic version of al wassit dictionary. An Arabic monolingual dictionary accomplished by the Academy of the Arabic Language in Cairo
A special edition of the Elektajournal from 1933 dedicated to the fifteenth anniversary of the Czechoslovak Republic. The introduction consists of a short retrospective composed of archival materials obtained in the streets of Prague on 28 October 1918. Footage of the parade on Wenceslaus Square on 28 October 1933. Raising the Czechoslovak flag in front of the Municipal House on Republic Square. Philips Radio vehicle with loudspeakers on the roof. Philips microphone. President Tomáš Garrigue Masaryk and Minister of National Defence Bohumír Bradáč during a military parade on Wenceslaus Square. The ceremonial parade heads down 28 October Street and National Avenue to Smetana Embankment. MPs František Soukup, František Staněk and Prime Minister Jan Malypetr watch the ceremony from a grandstand by the Rudolfinum. Shots of a Czechoslovak Army military parade; the parade consists of infantry, soldiers with bicycles, horse-drawn cannons and armoured cars. Flyover by military aircraft. Parade of scouts in front of President Masaryk in the third courtyard of Prague Castle.
The segment from a Degl film production company newsreel captures the 50th anniversary celebrations of laying of the cornerstone of the National Theatre. The celebrations took place on 16-18 May 1918 in Prague, with the participation of representatives of all Slavic nations of the Austro-Hungarian Empire. The first shots show the festively decorated building of the National Theatre. In the next part, the camera observes the events taking place in the upper part of Wenceslas Square. The staircase and the ramp of the National Museum, where the opening ceremony took place (specifically in its Pantheon), are filled with young people in national folk costumes. Shots of the crowded square. Cultural and political figures, such as poets Adolf Heyduk and Pavol Országh Hviezdoslav, writer Alois Jirásek and the head of the National Theatre Opera Karel Kovařovic, are leaving the building of the National Museum. This is followed by the symbolic ceremonial removal of politician Karel Kramář from the building. Afterwards, the Slovenian writer and mayor of Ljubljana Ivan Tavčar is seen leaving the building, as well as Czech actors Eduard Vojan, Marie Hübnerová, Leopolda Dostalová, Marie Laudová-Hořicová, Karel Želenský, writers Ignát Herrmann, František Herites and Jan Herben with his wife Bronislava, poet Bohdan Kaminský, politicians Alois Rašín, František Soukup, Gustav Habrman, Václav Klofáč and other notable national figures.
The segment captures the celebration of the fifth anniversary of the Czechoslovak Republic held in Prague on 28 October 1923. Festivities by the Statue of St Wenceslaus on Wenceslaus Square. Karel Kramář stands at the rostrum. Crowds gathered on the square wave their hats.
Segment from Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel) 1942, issue no. 17, captures the presentation of a gift Ï Ambulance Train no. 751 Ï from the Protectorate of Bohemia and Moravia to Adolf Hitler and the German army. The train handover took place at Prague Main Railway Station on 20 April 1942, the birthday of Adolf Hitler. Cars arrive in front of Prague Main Railway Station. Acting Reich Protector Reinhard Heydrich enters the train station. State President Emil Hácha gives a speech in the festively decorated railway hall. In response, Heydrich shakes his hand. The event is witnessed by a delegation of railway workers. The train crew lines up on the station platform. Heydrich enters the train with his entourage and inspects the sleeping cars, the operating carriage, the kitchen, and the sick bay. The inspection of the ambulance train is attended by Protectorate Prime Minister Jaroslav Krejčí and Minister of Education and People´s Enlightenment Emanuel Moravec. According to the voiceover, the train was made in a railway workshop in Prague-Bubny in record time. It consisted of 28 carriages and 20 hospital carriages, was 410 metres long, weighed 545 tons and had capacity for 280 wounded.
A Gold Standard Word Alignment for English-Swedish (GES) is a resource containing 1164 manually word aligned sentences pairs from English and Swedish versions of Europarl v. 2.
The data can be found here: https://www.ida.liu.se/labs/nlplab/ges/
A Gold Standard Word Alignment for English-Swedish (GES) is a resource containing 1164 manually word aligned sentences pairs from English and Swedish versions of Europarl v. 2.
This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification.
These are supplementary materials for an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification and is available at http://hdl.handle.net/11234/1-4615. These supplementary materials contain OCR texts from different OCR engines for book pages for which we have both high-resolution scanned images and annotations for OCR evaluation.
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) 1942, issue no. 27A, captures the Pledge of Czech Theatre Professionals´ Allegiance to the Reich, a manifestation held at the National Theatre in Prague on 25 June 1942, which was to unequivocally condemn the assassination of Acting Reich Protector Reinhard Heydrich. Speeches are delivered by actor Rudolf Deyl Jr. and Minister of Education and People´s Enlightenment Emanuel Moravec (silent). Actress Růžena Nasková and actors Karel Höger, Ferenc Futurista, and Stanislav Neumann are seen among the participants. The segment concludes with everyone performing the Nazi salute.
A morphological layer for the German part of the SMULTRON corpus. Layer was annotated according to the STTS tagset and the annotation guidelines of the Tiger corpus.
Coordinator: Thomas Müller
Annotators: Francesca Caratti, Arne Recknagel
This distribution contains a morphological layer for the SMULTRON corpus [0].
The annotation process is described in :
@InProceedings{mueller2015,
author = {M\"uller, Thomas and Sch\"utze, Hinrich},
title = {Robust Morphological Tagging with Word Representations},
booktitle = {Proceedings of NAACL},
year = {2015},
}
[0] http://www.cl.uzh.ch/research/parallelcorpora/paralleltreebanks/smultron_en.html
This small dataset contains 3 speech corpora collected using the Alex Translate telephone service (https://ufal.mff.cuni.cz/alex#alex-translate).
The "part1" and "part2" corpora contain English speech with transcriptions and Czech translations. These recordings were collected from users of the service. Part 1 contains earlier recordings, filtered to include only clean speech; Part 2 contains later recordings with no filtering applied.
The "cstest" corpus contains recordings of artificially created sentences, each containing one or more Czech names of places in the Czech Republic. These were recorded by a multinational group of students studying in Prague.
We present a test corpus of audio recordings and transcriptions of presentations of students' enterprises together with their slides and web-pages. The corpus is intended for evaluation of automatic speech recognition (ASR) systems, especially in conditions where the prior availability of in-domain vocabulary and named entities is benefitable.
The corpus consists of 39 presentations in English, each up to 90 seconds long, and slides and web-pages in Czech, Slovak, English, German, Romanian, Italian or Spanish.
The speakers are high school students from European countries with English as their second language.
We benchmark three baseline ASR systems on the corpus and show their imperfection.
The application, developed in C#, automatically identifies the language of a text written in one of the 21 European Union languages. By using training texts in different languages (approx. 1.5Mb of text for each language), a training module counts the prefixes (the first 3 characters) and the suffixes (4 characters endings) for all the words in the texts, for each language. For every language two models are constructed, containing the weights (percentages) of prefixes and suffixes in the texts representing a language. In the prediction phase, for a new text, two models are built on the fly in a similar manner. These models are then compared with the stored models representing each language for which the application was trained. Using comparison functions, the best model is chose. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2008). RACAI's Linguistic Web Services. In Proceedings of the 6th Language Resources and Evaluation Conference - LREC 2008, Marrakech, Morocco, May 2008. ELRA - European Language Resources Association. ISBN 2-9517408-4-0. -- Dan Tufiş and Alexandru Ceauşu (2007). Diacritics Restoration in Romanian Texts. In Elena Paskaleva and Milena Slavcheva (eds.), A Common Natural Language Processing Paradigm for Balkan Languages - RANLP 2007 Workshop Proceedings, pp. 49-56, Borovets, Bulgaria, September 2007. INCOMA Ltd., Shoumen, Bulgaria. ISBN 978-954-91743-8-0. -- Dan Tufiş and Adrian Chiţu (1999). Automatic Insertion of Diacritics in Romanian Texts. In Ferenc Kiefer, Gábor Kiss, and Júlia Pajzs (eds.), Proceedings of the 5th International Workshop on Computational Lexicography (COMPLEX 1999), pp. 185-194, Pecs, Hungary, May 1999. Linguistics Institute, Hungarian Academy of Sciences.