Number of results to display per page
Search Results
122. L2 Acquisition P-Moll Norbert Dittmar
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- corpus
- Language:
- German, Italian, and Polish
- Description:
- Language Acquisition corpus
- Rights:
- Not specified
123. L2 Acquisition Ursula Stephany & Christine Dimroth
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- corpus
- Language:
- German
- Description:
- Language Acquisition corpus
- Rights:
- Not specified
124. Languages in Migration
- Creator:
- Bučková, Aneta, Nekula, Marek, Lukeš, David, Woźniak, Michał, Wastl, Michael, and Polowy, Louisa
- Publisher:
- Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague and Universität Regensburg
- Type:
- text and corpus
- Subject:
- spoken language, bilingual, syntactic annotation, migrant language, narrative interviews, and language biography
- Language:
- German and Czech
- Description:
- LANGUAGES IN MIGRATION is designed as a representation of authentic spoken Czech and German that is used in informal speech (private environment, spontaneity, unpreparedness etc.) by Czech-German bilingual speakers born in Czechoslovakia around 1955 and who departed for Germany after becoming 12 years old. The corpus is composed of interviews conducted from 2018–2020 with 20 speakers on language biographies and narrated in Czech and German respectively. 10 interviews were recorded with late (German) repatriates and 10 with Czech migrants. The corpus includes transcripts of ca. 14 hours of Czech recordings and ca. 13,5 hours of German recordings. It contains 217 650 orthographic words (i.e. a total of 286 533 tokens including punctuation). Metadata of LANGUAGES IN MIGRATION include basic sociolinguistically relevant speaker categories (gender, year of birth and of migration, level of education and region of childhood and present residence). The transcription of LANGUAGES IN MIGRATION is linked to the corresponding audio track. The transcription was carried out on the orthographic tier and supplemented by an additional metalanguage tier. The corpus LANGUAGES IN MIGRATION is lemmatized and morphologically tagged in different formats for Czech and German (Stuttgart-Tübingen-Tagset). Deviations from the norm of the spoken Czech and German of the homeland, which are understood as the result of language contact and language isolation, are tagged in a further tier both in the Czech and in the German sub-corpuses of LANGUAGES IN MIGRATION. The (anonymized) corpus is provided in form of transcripts in EAF format, which can be viewed via the freely available ELAN program, and a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query engine to registered users of the CNC at http://www.korpus.cz
- Rights:
- Czech National Corpus (Shuffled Corpus Data), https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc, and ACA
125. Large-Scale Colloquial Persian 0.5
- Creator:
- Abdi Khojasteh, Hadi, Ansari, Ebrahim, and Bohlouli, Mahdi
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) and Institute for Advanced Studies in Basic Sciences (IASBS)
- Type:
- text and corpus
- Subject:
- PoS tagging, corpus, annotated corpus, multilingual, derivation, dependency parser, machine translation, informal language, spoken language, monolingual corpus, and bilingual corpus annotation
- Language:
- Persian, English, German, Czech, Italian, and Hindi
- Description:
- "Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in five different languages (EN, CS, DE, IT, HI).
- Rights:
- Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB
126. Latvian National Digital Library “Letonica”
- Publisher:
- National Library of Latvia
- Type:
- corpus
- Language:
- German, Latvian, and Russian
- Description:
- Its aim is to ensure digitising the collections of the National Library of Latvia and other similar organisations, by making them accessible on the Internet. The creation of the digital library lays the foundation for uniform principles of processing, storing the digitised materials and ensuring access to them.
- Rights:
- Not specified
127. Lexicon of Czech and German Anaphoric Connectives
- Creator:
- Rysová, Kateřina, Poláková, Lucie, Rysová, Magdaléna, and Mírovský, Jiří
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, lexicon, and lexicalConceptualResource
- Subject:
- lexicon, discourse, and bilingual
- Language:
- Czech and German
- Description:
- GeCzLex 1.0 is an online electronic resource for translation equivalents of Czech and German discourse connectives. It contains anaphoric connectives for both languages and their possible translations documented in bilingual parallel corpora (not necessarily anaphoric). The entries have been interlinked via semantic annotation of the connectives (taken from monolingual lexicons of connectives CzeDLex and DiMLex) according to the PDTB 3 sense taxonomy and translation possibilities aquired from the Czech and German parallel data of the Intercorp project. The lexicon is the first bilingual inventory of connectives with linkage on the level of individual pairs (connective + discourse sense).
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
128. LIMAS Corpus
- Publisher:
- Korpora.org and Fakultät Geisteswissenschaften, Universität Duisburg-Essen
- Type:
- corpus
- Subject:
- Germanistik
- Language:
- German
- Description:
- 1970s "representative" corpus of German created by the research group "Linguistik und Maschinelle Sprachbearbeitung" (linguistics and language processing); Zeitschnittkorpus der deutschen Schriftsprache von 1970; Querschnitt durch verschiedene Textsorten
- Rights:
- Not specified
129. Lingua::Interset 2.026
- Creator:
- Zeman, Daniel
- Publisher:
- Charles University, Faculty of Mathematics and Physics
- Type:
- tool and toolService
- Subject:
- morphology, part of speech, conversion, and tagset
- Language:
- Arabic, Bulgarian, Bengali, Catalan, Czech, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Japanese, Multiple languages, and Portuguese
- Description:
- Lingua::Interset is a universal morphosyntactic feature set to which all tagsets of all corpora/languages can be mapped. Version 2.026 covers 37 different tagsets of 21 languages. Limited support of the older drivers for other languages (which are not included in this package but are available for download elsewhere) is also available; these will be fully ported to Interset 2 in future. Interset is implemented as Perl libraries. It is also available via CPAN.
- Rights:
- Artistic License (Perl) 1.0, http://opensource.org/licenses/Artistic-Perl-1.0, and PUB
130. Mannheimer Texte Online (MATEO)
- Publisher:
- Universität Mannheim
- Type:
- corpus
- Subject:
- Germanistik
- Language:
- German and Latin
- Description:
- As a sub-section of MATEO, MARABU (Mannheimer Reihe Altes Buch) includes illustrated books, (manu)scripts and texts on the history of the Electoral Palatinate. Als Unterkategorie von MATEO beinhaltet MARABU (Mannheimer Reihe Altes Buch) illustrierte Bücher, Handschriften und Rarissima, Quellen zur Geschichte der Kurpfalz sowie Beiträge über Frauen des Humanismus.
- Rights:
- Not specified