This corpus was originally created for performance testing (server infrastructure CorpusExplorer - see: diskurslinguistik.net / diskursmonitor.de). It includes the filtered database (German texts only) of CommonCrawl (as of March 2018). First, the URLs were filtered according to their top-level domain (de, at, ch). Then the texts were classified using NTextCat and only uniquely German texts were included in the corpus. The texts were then annotated using TreeTagger (token, lemma, part-of-speech). 2.58 million documents - 232.87 million sentences - 3.021 billion tokens. You can use CorpusExplorer (http://hdl.handle.net/11234/1-2634) to convert this data into various other corpus formats (XML, JSON, Weblicht, TXM and many more).
The database offers access to over 6 million dialectal linguistic evidences of the project "Dictionary of Bavarian Dialects" (German: Das Bayerische Wörterbuch) as image snippets, partly and forthgoing lemmatized.
The area covered by the Dictionary of Bavarian Dialects (Bayerisches Wörterbuch) comprises Upper Bavaria, Lower Bavaria, the Upper Palatinate and neighbouring regions of Bavarian Swabia, Middle Franconia and Upper Franconia. Over and above the vernaculars spoken today, Bavaria’s literary tradition since its beginnings in the 8th century is also taken into account.
Starting in 1913, language material was collected from all Bavarian-speaking regions in Bavaria. Questionnaires were sent out to local informants throughout Bavaria, and contemporary and historical literary sources were excerpted. Today the collection comprises around nine million dialect examples. With the exception of the “Wörterlisten” (word lists), which can be digitally searched and edited, this material consists of index cards, to which corresponding standard German or quasi-standard German keywords have been added, filed alphabetically (see link below for more information).
For detailed information, please see https://www.bwb.badw.de/en/the-project.html and https://www.bwb.badw.de/en/digital-platform.html
A petition for a referendum (called: "Schluss mit Gendersprache in Verwaltung und Bildung" / eng.: "abolition of gender language in administration and education") was formed in Hamburg in February 2023. The project "Empirical Gender Linguistics" at the "Leibniz Institute for the German Language" took this as an opportunity to completely scrap the "https://www.hamburg.de" website (except the list of ships in the Port of Hamburg and the yellow page). The Hamburg.de website is the central digital contact point for citizens. The scraped texts were cleaned, processed and annotated using http://www.CorpusExplorer.de (TreeTagger - POS/Lemma information).
We use the corpus to analyze the use of words with gender signs.
This article deals with intercultural contact in branches of multinational companies or corporations founded in the Czech Republic by German, Austrian or Swiss owners. Multinationalbusinesses (large ones in particular) are trying to regulate the communication within the company. This is achieved predominantly by introducing an official corporate language in the company, employing people fluent in the language, and promoting language courses. Our research, based on the analysis of questionnaires and semi-structured interview data, has shown that the foreign employees seldom adapt to the language of the local employees, while the adaptation of the local employees to the language of the foreign ones is not only usual but also expected. The regulation of the communication therefore results in the promotion of primarily asymmetrical language adaptation, which benefits the German, Austrian and Swiss owners and the German-speaking foreign employees delegated by them (the so-called expatriates). However, the companies examined also promote the use of English to a considerable extent, which provides a basis for symmetrical communication between local and expatriate employees. and Der Artikel handelt über den interkulturellen Kontakt in multinationalen Unternehmen, die nach 1989 in der Tschechischen Republik durch deutsche, österreichische und schweizerische Unternehmen gegründet wurden. Multinationale Unternehmen (insbesondere die großen) versuchen die Kommunikation innerhalb des Unternehmens zu regulieren. Dies geschieht vor allem durch Einführung einer Firmensprache im Unternehmen, Anstellung von Mitarbeitern, die der Sprache mächtig sind, und Förderung von Sprachkursen.
In 9 % der Unternehmen ist das Tschechische die einzige Firmensprache, in 55 % übernimmt diese Aufgabe das Deutsche, in 16 % das Englische, in 15 % Deutsch und Englisch, in 5 % Deutsch und Tschechisch. Was die Sprachkurse betrifft, werden in 64 % der Unternehmen Deutschkurse, in 19 % Tschechischkurse und in 48 % Englischkurse gefördert.
Unsere auf Fragebögen und teilstrukturierten Interviews basierende Untersuchung hat gezeigt, dass sich die ausländischen, nach Tschechien entsandten Mitarbeiter nur selten an die Sprache der lokalen Mitarbeiter adaptieren, während die Adaptation der in Tschechien einheimischen Mitarbeiter an die Sprache der ausländischen Mitarbeiter nicht nur üblich ist, sondern auch erwartet wird. Die Regulierung der Kommunikation mündet also primär in eine asymmetrische sprachliche Adaptation zum Vorteil deutscher, österreichischer und schweizerischer Besitzer und deutschsprachiger ausländischer Mitarbeiter (sog. Expatriates), die durch die Besitzer nach Tschechien delegiert werden. Die untersuchten Unternehmen unterstützen jedoch in beachtlichem Ausmaß auch die Verwendung des Englischen, das eine Basis für symmetrische Kommunikation zwischen den in Tschechien einheimischen und nach Tschechien entsandten Mitarbeitern bildet.
Diese Adaptation betrifft jedoch konkret vor allem die Managementebene, während die Produktion weitgehend tschechisch geprägt bleibt. Weit verbreitet ist auch die Nicht-Adaptation, die zum Einsatz von Dolmetschern und Übersetzern führt. Dies ist – neben der asymmetrischen Adaptation und dem Rückgriff auf das Englische – in 80 % der Unternehmen bzw. in 95 % der großen Unternehmen der Fall.
Eine Detailbeschreibung der Kommunikation in einem der auf dem Gebiet der Tschechischen Republik tätigen Unternehmen des Siemens-Konzerns macht deutlich, wie die Funktionsstellen in einem Produktionsunternehmen besetzt und mit welcher sprachlichen Qualifikation diese verbunden werden, sie zeigt aber auch, wie sich die Firmensprache ändert, wie die interkulturelle Kommunikation unter Einsatz von sprachlich qualifizierten Mitarbeitern konkret abläuft und wie diese – etwa in Sprachkursen – auf ihre Aufgaben vorbereitet werden.
Recenzentka vítá české vydání knihy Andrease Kosserta Kalte Heimat: Die Geschichte der deutschen Vertriebenen nach 1945 (München 2008), předkládající relativně nový a ucelený pohled na osud poválečných německých vyhnanců ze střední a východní Evropy v obou částech rozděleného Německa, který je ve zdejším prostředí vnímán často zkresleně a na základě nedostatečných informací. Autor se však podle ní bohužel nevyhnul jistému zjednodušování, nepřesnostem a tendenčnosti, ve kterých se projevuje jednak jeho neznalost českých a polských pramenů a literatury, jednak osobní zainteresovanost na tématu. Odbourává sice navenek deklarovaný mýtus o úspěšné integraci vyhnanců ve Spolkové republice Německo, sám však vytváří schematický obraz „dobrých vyhnanců“ a ostatních „zlých Němců“., In her review of the Czech translation of Andreas Kossert’s Kalte Heimat: Die Geschichte der deutschen Vertriebenen nach 1945 (Munich, 2008), the reviewer welcomes the Czech edition, which presents a comparatively new and comprehensive view of the fate of the ethnic Germans who were expelled from central and eastern Europe after the Second World War and settled in both parts of divided Germany. In the Czech Republic, the perception of their fate is too often distorted, presented without sufficient information. According to the reviewer, however, the author has not avoided oversimplification, imprecision, and tendentiousness, manifesting on the one hand a lack of knowledge of Czech and Polish primary sources and secondary literature and, on the other, his personal involvement in the topic. Although he dispels the generally declared myth about the expellees’ successful integration into the Federal Republic of Germany, he paints an oversimplified picture of the ‘good expellees’ on the one hand and the ‘wicked Germans’ on the other., and [autor recenze] Sandra Kreisslová.
This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from general public and medical experts. and This work was supported by the EU FP7 project Khresmoi (European Comission contract No. 257528). The language resources are distributed by the LINDAT/Clarin project of the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2010013).
We thank Health on the Net Foundation for granting the license for the English general public queries, TRIP database for granting the license for the English medical expert queries, and three anonymous translators and three medical experts for translating amd revising the data.
This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans Swedish. The queries come from general public and medical experts. This is version 2.0 extending the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
This package contains data sets for development and testing of machine translation of sentences from summaries of medical articles between Czech, English, French, and German. and This work was supported by the EU FP7 project Khresmoi (European Comission contract No. 257528). The language resources are distributed by the LINDAT/Clarin project of the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2010013). We thank all the data providers and copyright holders for providing the source data and anonymous experts for translating the sentences.
This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech, English, French, German, Hungarian, Polish, Spanish
and Swedish. Version 2.0 extends the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
This article deals with germanisms in Czech. Frequencies of 26 different new High German loanwords were analyzed in the Czech National Corpus. These borrowed words were standing in competition with their Czech synonyms. This comparison is used to study the question of whether germanisms or their equivalents in Czech are more used by native speakers. For this analysis new High German loanwords were deliberately selected in order to verify the actuality of the topic. But the major part of the study was examined in a diachronic period. This shows not only the current situation but in most cases the frequency of the selected loanwords throughout their existence. The calculations of the average frequency are made for each century (since 1650), and also in the recent modern period (from 1947 to 2008). and Článek se zabývá germanizmy v češtině. Prostřednictvím Českého národního korpusu byly zjišťovány různé frekvence 26 novohornoněmeckých výpůjček a jim konkurujících českých synonym. Článek se na základě frekvenčních srovnání snaží odpovědět na otázku, zda čeští rodilí mluvčí preferují germanizmy či dávají přednost jejich českým ekvivalentům. Článek analyzuje nejen aktuální situaci, ale ve většině případů ukazuje frekvenci vybraných germanizmů z diachronního hlediska, po celou dobu jejich existence. Byla vypočtena průměrná frekvence za každé století (od roku 1650), včetně posledního moderního období (od roku 1947 do roku 2008).