Original context has metadata only: true / Rights: Not specified - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Rights Not specified Original context has metadata only true

1. A Gold Standard Word Alignment for English-Swedish

Creator:: Ahrenberg, Lars and Holmqvist, Maria
Publisher:: Linköping University
Type:: text, wordList, and lexicalConceptualResource
Subject:: word alignment
Language:: Swedish and English
Description:: A Gold Standard Word Alignment for English-Swedish (GES) is a resource containing 1164 manually word aligned sentences pairs from English and Swedish versions of Europarl v. 2. The data can be found here: https://www.ida.liu.se/labs/nlplab/ges/
Rights:: Not specified

2. A simplified front-end for SemTi-Kamols morphological analyser

Publisher:: Institute of Mathematics and Computer Science, University of Latvia
Type:: toolService
Subject:: morphological analyzer
Language:: Latvian
Description:: A simplified front-end (in a form of a RESTful web service) of the SemTi-Kamols morphological analyzer. Mainly for demonstration purposes.
Rights:: Not specified

3. ABC - Language Identifier

Publisher:: Research Institute for Artificial Intelligence, Romanian Academy of Sciences
Type:: toolService
Description:: The application, developed in C#, automatically identifies the language of a text written in one of the 21 European Union languages. By using training texts in different languages (approx. 1.5Mb of text for each language), a training module counts the prefixes (the first 3 characters) and the suffixes (4 characters endings) for all the words in the texts, for each language. For every language two models are constructed, containing the weights (percentages) of prefixes and suffixes in the texts representing a language. In the prediction phase, for a new text, two models are built on the fly in a similar manner. These models are then compared with the stored models representing each language for which the application was trained. Using comparison functions, the best model is chose. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2008). RACAI's Linguistic Web Services. In Proceedings of the 6th Language Resources and Evaluation Conference - LREC 2008, Marrakech, Morocco, May 2008. ELRA - European Language Resources Association. ISBN 2-9517408-4-0. -- Dan Tufiş and Alexandru Ceauşu (2007). Diacritics Restoration in Romanian Texts. In Elena Paskaleva and Milena Slavcheva (eds.), A Common Natural Language Processing Paradigm for Balkan Languages - RANLP 2007 Workshop Proceedings, pp. 49-56, Borovets, Bulgaria, September 2007. INCOMA Ltd., Shoumen, Bulgaria. ISBN 978-954-91743-8-0. -- Dan Tufiş and Adrian Chiţu (1999). Automatic Insertion of Diacritics in Romanian Texts. In Ferenc Kiefer, Gábor Kiss, and Júlia Pajzs (eds.), Proceedings of the 5th International Workshop on Computational Lexicography (COMPLEX 1999), pp. 185-194, Pecs, Hungary, May 1999. Linguistics Institute, Hungarian Academy of Sciences.
Rights:: Not specified

4. Access rights Management System

Publisher:: Max Planck Institute for Psycholinguistics
Type:: toolService
Description:: A tool to grant and deny the access to (parts of) an IMDI-based corpus. Support for advanced settings like ACLs.
Rights:: Not specified

5. Álgu – Origins of Saami Words

Publisher:: The Research Institute for the Languages of Finland
Type:: lexicalConceptualResource
Language:: Northern Sami
Description:: The database will contain an etymological lexicon of Saami languages complete with detailed source citations. The database will be open to the public in November 2006 and will be updated regularly.
Rights:: Not specified

6. Álgu – Origins of Saami Words (Álgu – Saamen sanojen etymologinen tietokanta)

Type:: lexicalConceptualResource
Description:: 70,000 words, over 100,000 etymological relations, Relational database
Rights:: Not specified

7. Alpino Treebank

Publisher:: Center for Language and Cognition
Format:: application/xml
Type:: corpus
Language:: Dutch
Description:: A database of 7.000 syntactically analyzed Dutch sentences.
Rights:: Not specified

8. ALTWEB

Type:: corpus
Language:: Italian
Description:: Dialect (Tuscan); 380.000 entries; written; DBT tagset
Rights:: Not specified

9. Amara - universal subtitles

Type:: corpus
Language:: Arabic, Danish, Dutch, English, German, Modern Greek (1453-), Italian, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish
Description:: Large set of subtitles available for download in multiple languages. Can be used as parallel corpus.
Rights:: Not specified

10. Anglos-Saxon charters

Publisher:: King's College London
Format:: application/tei+xml
Type:: corpus
Language:: English
Description:: Charters written in Anglo-Saxon England before A.D. 900, marked-up in TEI XML. Browsable online.
Rights:: Not specified

11. Annex - Annotation Exploration tool

Publisher:: Max Planck Institute for Psycholinguistics
Type:: toolService
Description:: tool in the MPI web-based framework for archive exploration (and enrichment)
Rights:: Not specified

12. ANNIS

Publisher:: University of Potsdam, Dept. of Linguistics and Humboldt-University Berlin, Institut für deutsche Sprache und Linguistik
Type:: toolService
Description:: ANNIS2 is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation. ANNIS, which stands for ANNotation of Information Structure, has been designed to provide access to the data of the SFB 632 - "Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts". Since information structure interacts with linguistic phenomena on many levels, ANNIS2 addresses the SFB's need to concurrently annotate, query and visualize data from such varied areas as syntax, semantics, morphology, prosody, referentiality, lexis and more. For project working with spoken language, support for audio / video annotations is also required.
Rights:: Not specified

13. Anotatornia

Publisher:: Institute of Computer Science, Polish Academy of Sciences
Type:: toolService
Description:: Tool for manual on-line annotation of corpora at various linguistic levels. The levels currently implemented are: word-level and sentence-level segmentation, morphosyntax, word sense disambiguation. Anotatornia implements sophisticated mechanisms of the management of texts, annotators and conflicts.
Rights:: Not specified

14. Apertium Old Catalan morphological analyzer

Publisher:: Universidad de Alicante
Type:: toolService
Subject:: morphological analyzer
Language:: Catalan
Description:: A RESTful morphological analyzer for Old Catalan.
Rights:: Not specified

15. Aquén - Toponimia galega

Publisher:: TALG Research Group (University of Vigo)
Type:: lexicalConceptualResource
Language:: Galician
Description:: Galician Toponymy Database, 40,000 entries
Rights:: Not specified

16. Arabic ACL corpus

Creator:: Salah Elfahal Elebaed, Hoyam, Kasbi, Mohammed, Nasri, Mohammed, and Bouzoubaa, Karim
Publisher:: International Journal of Computer Science Trends and Technology (IJCST)
Type:: text and corpus
Subject:: Controlled Natural Language, Arabic CNL, ACL, Arabic Corpus, and and TEI.
Language:: Arabic
Description:: This corpus constitutes all sentences representing the Arabic Controlled Language (ACL). It contains 551 sentences taken from four textbooks and websites dedicated to teach Arabic language to kids such as: a) First grade book, Republic of Sudan (كتاب الصف الاول جمهورية السودان), b) Al Jazeera Educational Site (موقع الجزيرة التعليمي), c) Bella Preparatory School Girls Forum (منتدى مدرسة بيلا الاعدادية بنات), and d) Albahr website (موقع انا البحر). These sentences are respecting 52 ACL rules. The average number of sentences for each rule is 10.6. All sentences in the corpus were analyzed by Farasa syntactic parser to confirm they are correctly analyzed. The validity of the parsing was done manually by linguist experts. The structure of this corpus is made of a header and a body. The header consists of a set of metadata that describe the corpus, such as the corpus name, the authors, the sources and further meta data. While the header is made of metadata, the body contains rules. Each rule has a code, a structure and all sentences respecting that rule. For each sentence, we store an id, the vowelledand unvowelled text as well as the result of parsing using Farasa.
Rights:: Not specified

17. Araucaria

Publisher:: School of Computing, University of Dundee
Type:: toolService
Subject:: argument analyzer
Description:: Araucaria is a software tool for analysing arguments. It aids a user in reconstructing and diagramming an argument using a simple point-and-click interface. The software also supports argumentation schemes, and provides a user-customisable set of schemes with which to analyse arguments. Written in Java, released under the GNU General Public License.
Rights:: Not specified

18. Arborest

Type:: corpus
Language:: Estonian
Description:: 149 sentences, VISL tagset
Rights:: Not specified

19. Arts and Humanities Data Service Literature, Languages and Linguistics

Type:: corpus
Language:: English
Description:: Electronic texts, corpora, lexicons. other
Rights:: Not specified

20. Assigning lemmas and part-of-speech to wordform lists

Type:: toolService
Language:: Slovenian
Description:: online service
Rights:: Not specified

« Previous
Next »
1
2
3
4
5
…
29
30

1. A Gold Standard Word Alignment for English-Swedish

2. A simplified front-end for SemTi-Kamols morphological analyser

3. ABC - Language Identifier

4. Access rights Management System

5. Álgu – Origins of Saami Words

6. Álgu – Origins of Saami Words (Álgu – Saamen sanojen etymologinen tietokanta)

7. Alpino Treebank

8. ALTWEB

9. Amara - universal subtitles

10. Anglos-Saxon charters

11. Annex - Annotation Exploration tool

12. ANNIS

13. Anotatornia

14. Apertium Old Catalan morphological analyzer

15. Aquén - Toponimia galega

16. Arabic ACL corpus

17. Araucaria

18. Arborest

19. Arts and Humanities Data Service Literature, Languages and Linguistics

20. Assigning lemmas and part-of-speech to wordform lists

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from