Original context has metadata only: true / Rights: Not specified - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Rights Not specified Original context has metadata only true

1. A Gold Standard Word Alignment for English-Swedish

Creator:: Ahrenberg, Lars and Holmqvist, Maria
Publisher:: Linköping University
Type:: text, wordList, and lexicalConceptualResource
Subject:: word alignment
Language:: Swedish and English
Description:: A Gold Standard Word Alignment for English-Swedish (GES) is a resource containing 1164 manually word aligned sentences pairs from English and Swedish versions of Europarl v. 2. The data can be found here: https://www.ida.liu.se/labs/nlplab/ges/
Rights:: Not specified

2. A simplified front-end for SemTi-Kamols morphological analyser

Publisher:: Institute of Mathematics and Computer Science, University of Latvia
Type:: toolService
Subject:: morphological analyzer
Language:: Latvian
Description:: A simplified front-end (in a form of a RESTful web service) of the SemTi-Kamols morphological analyzer. Mainly for demonstration purposes.
Rights:: Not specified

3. ABC - Language Identifier

Publisher:: Research Institute for Artificial Intelligence, Romanian Academy of Sciences
Type:: toolService
Description:: The application, developed in C#, automatically identifies the language of a text written in one of the 21 European Union languages. By using training texts in different languages (approx. 1.5Mb of text for each language), a training module counts the prefixes (the first 3 characters) and the suffixes (4 characters endings) for all the words in the texts, for each language. For every language two models are constructed, containing the weights (percentages) of prefixes and suffixes in the texts representing a language. In the prediction phase, for a new text, two models are built on the fly in a similar manner. These models are then compared with the stored models representing each language for which the application was trained. Using comparison functions, the best model is chose. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2008). RACAI's Linguistic Web Services. In Proceedings of the 6th Language Resources and Evaluation Conference - LREC 2008, Marrakech, Morocco, May 2008. ELRA - European Language Resources Association. ISBN 2-9517408-4-0. -- Dan Tufiş and Alexandru Ceauşu (2007). Diacritics Restoration in Romanian Texts. In Elena Paskaleva and Milena Slavcheva (eds.), A Common Natural Language Processing Paradigm for Balkan Languages - RANLP 2007 Workshop Proceedings, pp. 49-56, Borovets, Bulgaria, September 2007. INCOMA Ltd., Shoumen, Bulgaria. ISBN 978-954-91743-8-0. -- Dan Tufiş and Adrian Chiţu (1999). Automatic Insertion of Diacritics in Romanian Texts. In Ferenc Kiefer, Gábor Kiss, and Júlia Pajzs (eds.), Proceedings of the 5th International Workshop on Computational Lexicography (COMPLEX 1999), pp. 185-194, Pecs, Hungary, May 1999. Linguistics Institute, Hungarian Academy of Sciences.
Rights:: Not specified

4. Access rights Management System

Publisher:: Max Planck Institute for Psycholinguistics
Type:: toolService
Description:: A tool to grant and deny the access to (parts of) an IMDI-based corpus. Support for advanced settings like ACLs.
Rights:: Not specified

5. Álgu – Origins of Saami Words

Publisher:: The Research Institute for the Languages of Finland
Type:: lexicalConceptualResource
Language:: Northern Sami
Description:: The database will contain an etymological lexicon of Saami languages complete with detailed source citations. The database will be open to the public in November 2006 and will be updated regularly.
Rights:: Not specified

6. Álgu – Origins of Saami Words (Álgu – Saamen sanojen etymologinen tietokanta)

Type:: lexicalConceptualResource
Description:: 70,000 words, over 100,000 etymological relations, Relational database
Rights:: Not specified

7. Alpino Treebank

Publisher:: Center for Language and Cognition
Format:: application/xml
Type:: corpus
Language:: Dutch
Description:: A database of 7.000 syntactically analyzed Dutch sentences.
Rights:: Not specified

8. ALTWEB

Type:: corpus
Language:: Italian
Description:: Dialect (Tuscan); 380.000 entries; written; DBT tagset
Rights:: Not specified

9. Amara - universal subtitles

Type:: corpus
Language:: Arabic, Danish, Dutch, English, German, Modern Greek (1453-), Italian, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish
Description:: Large set of subtitles available for download in multiple languages. Can be used as parallel corpus.
Rights:: Not specified

10. Anglos-Saxon charters

Publisher:: King's College London
Format:: application/tei+xml
Type:: corpus
Language:: English
Description:: Charters written in Anglo-Saxon England before A.D. 900, marked-up in TEI XML. Browsable online.
Rights:: Not specified

11. Annex - Annotation Exploration tool

Publisher:: Max Planck Institute for Psycholinguistics
Type:: toolService
Description:: tool in the MPI web-based framework for archive exploration (and enrichment)
Rights:: Not specified

12. ANNIS

Publisher:: University of Potsdam, Dept. of Linguistics and Humboldt-University Berlin, Institut für deutsche Sprache und Linguistik
Type:: toolService
Description:: ANNIS2 is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation. ANNIS, which stands for ANNotation of Information Structure, has been designed to provide access to the data of the SFB 632 - "Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts". Since information structure interacts with linguistic phenomena on many levels, ANNIS2 addresses the SFB's need to concurrently annotate, query and visualize data from such varied areas as syntax, semantics, morphology, prosody, referentiality, lexis and more. For project working with spoken language, support for audio / video annotations is also required.
Rights:: Not specified

13. Anotatornia

Publisher:: Institute of Computer Science, Polish Academy of Sciences
Type:: toolService
Description:: Tool for manual on-line annotation of corpora at various linguistic levels. The levels currently implemented are: word-level and sentence-level segmentation, morphosyntax, word sense disambiguation. Anotatornia implements sophisticated mechanisms of the management of texts, annotators and conflicts.
Rights:: Not specified

14. Apertium Old Catalan morphological analyzer

Publisher:: Universidad de Alicante
Type:: toolService
Subject:: morphological analyzer
Language:: Catalan
Description:: A RESTful morphological analyzer for Old Catalan.
Rights:: Not specified

15. Aquén - Toponimia galega

Publisher:: TALG Research Group (University of Vigo)
Type:: lexicalConceptualResource
Language:: Galician
Description:: Galician Toponymy Database, 40,000 entries
Rights:: Not specified

16. Arabic ACL corpus

Creator:: Salah Elfahal Elebaed, Hoyam, Kasbi, Mohammed, Nasri, Mohammed, and Bouzoubaa, Karim
Publisher:: International Journal of Computer Science Trends and Technology (IJCST)
Type:: text and corpus
Subject:: Controlled Natural Language, Arabic CNL, ACL, Arabic Corpus, and and TEI.
Language:: Arabic
Description:: This corpus constitutes all sentences representing the Arabic Controlled Language (ACL). It contains 551 sentences taken from four textbooks and websites dedicated to teach Arabic language to kids such as: a) First grade book, Republic of Sudan (كتاب الصف الاول جمهورية السودان), b) Al Jazeera Educational Site (موقع الجزيرة التعليمي), c) Bella Preparatory School Girls Forum (منتدى مدرسة بيلا الاعدادية بنات), and d) Albahr website (موقع انا البحر). These sentences are respecting 52 ACL rules. The average number of sentences for each rule is 10.6. All sentences in the corpus were analyzed by Farasa syntactic parser to confirm they are correctly analyzed. The validity of the parsing was done manually by linguist experts. The structure of this corpus is made of a header and a body. The header consists of a set of metadata that describe the corpus, such as the corpus name, the authors, the sources and further meta data. While the header is made of metadata, the body contains rules. Each rule has a code, a structure and all sentences respecting that rule. For each sentence, we store an id, the vowelledand unvowelled text as well as the result of parsing using Farasa.
Rights:: Not specified

17. Araucaria

Publisher:: School of Computing, University of Dundee
Type:: toolService
Subject:: argument analyzer
Description:: Araucaria is a software tool for analysing arguments. It aids a user in reconstructing and diagramming an argument using a simple point-and-click interface. The software also supports argumentation schemes, and provides a user-customisable set of schemes with which to analyse arguments. Written in Java, released under the GNU General Public License.
Rights:: Not specified

18. Arborest

Type:: corpus
Language:: Estonian
Description:: 149 sentences, VISL tagset
Rights:: Not specified

19. Arts and Humanities Data Service Literature, Languages and Linguistics

Type:: corpus
Language:: English
Description:: Electronic texts, corpora, lexicons. other
Rights:: Not specified

20. Assigning lemmas and part-of-speech to wordform lists

Type:: toolService
Language:: Slovenian
Description:: online service
Rights:: Not specified

21. Atlas of Place Names

Publisher:: The Research Institute for the Languages of Finland
Type:: toolService
Language:: Finnish
Description:: The digital atlas illustrates the distribution of 234 common Finnish place-name elements based on data in the Names Archive.
Rights:: Not specified

22. Audio and video database of Latvian folklore

Publisher:: Archives of Latvian Folklore, Institute of Literature, Folklore and Art, University of Latvia
Format:: application/octet-stream
Type:: corpus
Language:: Latvian
Description:: The database contains audio and video material related to traditional culture - songs, folktales, legends, life stories and various collective or individual folklore related performances. The content has been either specifically contributed to the Archives of Latvian Folklore or collected by its staff members.
Rights:: Not specified

23. Audio Recordings Archive

Publisher:: The Research Institute for the Languages of Finland
Type:: corpus
Language:: Finnish
Description:: The Audio Recordings Archive (Suomen kielen nauhoitearkisto) holds over 23,000 hours of recordings collected since 1959, providing authentic samples of Finnish dialects, languages related to Finnish, and other world languages. The collection additionally includes samples of Finnish dialects spoken in Sweden, Norway, Ingria, the United States and Australia. Digitisation of the audio bank was undertaken in 1999. Over half of its content has been digitised, totalling about 13,000 hours of recordings.
Rights:: Not specified

24. BABEL Estonian Database

Publisher:: Institute of Cybernetics at Tallinn University of Technology
Type:: corpus
Language:: Estonian
Description:: The database consists of three sets: - Many Talker Set: 30 males, 30 females; each to read 50 numbers, 1-2 connected passages, 1 block of "filler" sentences, and 1 block of syllables. - Few Talker Set: 4 males, 4 females; each to read 50 numbers, 10 connected passages, 1 block of "filler" sentences, and 2-3 blocks of syllables. - Very Few Talker Set: 1 male, 1 female; each to read 2 blocks of 50 numbers, 40 connected passages, 4 blocks of "filler" sentences, and 9 blocks of syllables. Total amount ca 12 hours of speech.
Rights:: Not specified

25. Banco de neologismos 2004-2007

Publisher:: Instituto Cervantes and Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: lexicalConceptualResource
Subject:: neologisms database
Language:: Catalan
Description:: Repository of neologisms (15.375 entries)
Rights:: Not specified

26. Base de synonymes CRISCO

Type:: lexicalConceptualResource
Language:: French
Description:: 49.000, RDB
Rights:: Not specified

27. Basic vocabulary on the Human Genome

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: lexicalConceptualResource
Language:: Catalan, English, French, Galician, Italian, Portuguese, and Spanish
Description:: A vocabulary resulting from the cooperation of the groups of REALITER network that collects the basic terminology mostly used in texts about Genomics. It contains equivalents in English, Peninsular and Latinamerican Spanish, French, Italian, Galician, Portuguese and Catalan.
Rights:: Not specified

28. Bavaria's Dialects Online

Creator:: Raaf, Manuel
Publisher:: Bayerische Akademie der Wissenschaften and Bavarian Academy of Sciences and Humanities
Type:: text, machineReadableDictionary, and lexicalConceptualResource
Subject:: dictionary, web dictionary, Dialektologie, dialect variation, language variation, Dialectology, dialectology, Bavarian, Bavaria, Swabian, Frankish, Franconian Language, and spoken language
Language:: German, Bavarian, Swabian, and Frankish
Description:: Bavaria's Dialects Online (BDO) is the digital language information system of the three projects "Bavarian Dictionary", "Franconian Dictionary", and "Dialectological Information System of Bavarian Swabia". The database combines the research results of dialect research and presents dictionary articles as well as research data in a freely accessible online tool. BDO is not only aimed at scholars, but also at the lay public interested in the language. Here, the vocabulary of all Bavarian dialects is collected in one place and made accessible. The system shows the richness of the dialects of Bavaria in combination. With the new database, one will be able to compare the dialect vocabulary of Old Bavaria, Franconia and Swabia. Authentic dialect evidence is used to illustrate the dialect words in their variety of meanings and regional distribution, as well as to show their use in idioms, proverbs, and much more. BDO allows a whole new look at the vocabulary of the dialects of all parts of the state of Bavaria.
Rights:: Not specified

29. Berliner Wendekorpus

Publisher:: Berlin-Brandenburg Academy of Sciences and Humanities
Format:: application/tei+xml
Type:: corpus
Language:: German
Description:: Transcribed narrative interviews with people from East and West Berlin about the events of November 9. 282,000 tokens. TEI XML, lemma and POS. Normalized version also available.
Rights:: Not specified

30. Bibliografie zur deutschen Grammatik (BDG)

Publisher:: Institut für Deutsche Sprache
Type:: toolService
Language:: German
Description:: Online Bibliography, bibliographic database
Rights:: Not specified

31. Bibliotheca Augustana / Bibliotheca Germanica

Publisher:: Hochschule Augsburg
Type:: corpus
Subject:: Germanistik
Language:: German
Description:: Chronology of German literature (Old High German literature, Middle High German literature, Early New High German literature, New High German literature); Chronologie der deutschen Literatur (alt-, mittel-, frühneu-, neuhochdeutsche Literatur)
Rights:: Not specified

32. Bilder-Conversations-Lexikon

Type:: lexicalConceptualResource
Subject:: Germanistik
Language:: German
Description:: digitale Ausgabe der ersten Auflage des "Bilder-Conversations-Lexikons für das deutsche Volk" (1837-1841); "Handbuch zur Verbreitung gemeinnütziger Kenntnisse und zur Unterhaltung" (Selbstbeschreibung im Vorwort); beinhaltet zahlreiche Abbildungen und Landkarten
Rights:: Not specified

33. Bilingual English-Lithuanian, Lithuanian-English, Czech-Lithuanian, Lithuanian-Czech corpora

Publisher:: Center of Computational Linguistics, Vytautas Magnus University
Format:: application/xml
Type:: corpus
Language:: Czech, English, and Lithuanian
Description:: A collection of parallel corpora: English-Lithuanian (2m words), Lithuanian-English (0,06m words), Czech-Lithuanian (0,8m words), Lithuanian-Czech (0,02m words). All the corpora are online-searcheable via one interface at http://donelaitis.vdu.lt/main_en.php?id=4&nr=1_2. The corpus is still being updated with new texts.
Rights:: Not specified

34. BitPar

Creator:: Schmid, Helmut
Publisher:: University of Stuttgart
Type:: toolService
Subject:: parser
Description:: Statistical parser
Rights:: Not specified

35. Blingual Language Acquisition Julka Corpus

Publisher:: Max Planck Institute for Psycholinguistics
Type:: corpus
Language:: German and Polish
Description:: Language Acquisition corpus
Rights:: Not specified

36. BNF Converter

Publisher:: Språkbanken, Dept. of Swedish Language, Göteborg University
Type:: toolService
Subject:: compiler construction and grammar
Description:: The BNF Converter is a compiler construction tool generating a compiler front-end from a Labelled BNF grammar.
Rights:: Not specified

37. Bochumer Mittelhochdeutsch-Korpus

Publisher:: Ruhr-Universität Bochum
Type:: corpus
Subject:: Germanistik
Language:: German
Description:: Verses, prose and certificates from Middle High German; mittelhochdeutsche Verse, Prosastücke und Urkunden
Rights:: Not specified

38. Bokmålsordboka

Publisher:: Department of Linguistics and Nordic Studies, University of Oslo
Type:: lexicalConceptualResource
Description:: 65 000 entries with definitions, etymology, examples
Rights:: Not specified

39. Bonner Frühneuhochdeutschkorpus (FnhdC)

Publisher:: Korpora.org and Fakultät Geisteswissenschaften, Universität Duisburg-Essen
Type:: corpus
Subject:: Germanistik
Language:: German
Description:: Digital, morphologically annotated (N, V, A) part of the Bonn Corpus of Early New High German; used to create the Grammatik des Frühneuhochdeutschen (III. Nouns; IV. Verbs; VI. Adjectives); morphologisch annotiert; Materialgrundlage für die Erarbeitung der Bände 3, 4 und 6 der "Grammatik des Frühneuhochdeutschen"
Rights:: Not specified

40. Botanicus Digital Library

Type:: corpus
Subject:: Germanistik
Language:: Chinese, Czech, English, French, German, Latin, and Spanish
Description:: Digital copies of historical botanic papers from the Missouri Botanical Garden Library; Bilddigitalisate von historischen botanischen Schriften; deutschsprachige Texte stellen nur einen Teilbereich dar
Rights:: Not specified

41. British academic spoken English (BASE) corpus

Publisher:: Coventry University, University of Reading, University of Warwick
Format:: application/tei+xml
Type:: corpus
Language:: English
Description:: Transcribed recordings of 160 lectures and 39 seminars held in university departments. Four broad disciplinary groups, 1,644,942 tokens in total.
Rights:: Not specified

42. British National Corpus

Type:: corpus
Language:: English
Description:: General reference corpus; 100 million words; POS, lemma, descriptive metadata
Rights:: Not specified

43. Brockhaus' Kleines Konversations-Lexikon

Type:: lexicalConceptualResource
Subject:: Germanistik
Language:: German
Description:: 5. Aufl. 1911; Fokus auf Politik, Wirtschaft, Kultur und Technik zu Beginn des 20. Jahrhunderts
Rights:: Not specified

44. Budapest Sociolinguistic Interview (BSI)

Publisher:: Academy of Sciences
Type:: corpus
Language:: Hungarian
Description:: BSI is a large-scale survey which provides reliable data on and analyses of the varieties of Hungarian spoken in Budapest.
Rights:: Not specified

45. Bulgarian CLEF Corpus

Type:: corpus
Language:: Bulgarian
Description:: Written, synchronic, general (newspapers)
Rights:: Not specified

46. Bulgarian-Croatian Comparable Corpus

Type:: corpus
Language:: Bulgarian and Croatian
Description:: written; domain-specific (newspaper); diachronic; bilingual; comparable; ca 3,500,000 tokens (393 Kw Bulgarian; 3.1 Mw Croatian)
Rights:: Not specified

47. BulTreeBank

Type:: corpus
Language:: Bulgarian
Description:: HPSG-based annotation including: constituent structure, dependency relations, named entities (classified as person, organisation, location or other names), coreferential relations. Annotation in XML
Rights:: Not specified

48. BulTreeBank Frequency List

Type:: lexicalConceptualResource
Language:: Bulgarian
Description:: 100 000 most frequent Cyrillic tokens in the BulTreeBank text archive, UTF-16 list of token-frequency pairs
Rights:: Not specified

49. BulTreeBank Morphological Analyzer

Creator:: Simov, Kiril and Osenova, Petya
Publisher:: Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:: toolService
Description:: It is used morphological lexicon of Bulgarian (100 000 lemmas) compiled as a finite-state automaton in CLaRK System. It requires the text to be first tokenized and it is applied in each token. Includes also guessers for unknown words and Named Entities gazetteers. If the corresponding resources are available for a different language, then it can be tuned to it.
Rights:: Not specified

50. BulTreeBank Morphosyntactic Corpus

Type:: corpus
Language:: Bulgarian
Description:: Written, synchronic, general, manually annotated, 1 000 000 tokens divided in three sets: 215 000 tokens used in BulTreeBank HPSG Treebank (see below), additionally 300 000 checked second time, rest about 480 000 checked by the annotators. Morphosyntactic annotation with the BulTreeBank Tagset (http://www.bultreebank.org/TechRep/BTB-TR03.pdf), XML, annotation description in technical reports of BulTreeBank project http://www.bultreebank.org/TechRep
Rights:: Not specified

« Previous
Next »
1
2
3
4
5
…
11
12

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from