Rights: Not specified - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Rights Not specified

1. A Gold Standard Word Alignment for English-Swedish

Creator:: Ahrenberg, Lars and Holmqvist, Maria
Publisher:: Linköping University
Type:: text, wordList, and lexicalConceptualResource
Subject:: word alignment
Language:: Swedish and English
Description:: A Gold Standard Word Alignment for English-Swedish (GES) is a resource containing 1164 manually word aligned sentences pairs from English and Swedish versions of Europarl v. 2. The data can be found here: https://www.ida.liu.se/labs/nlplab/ges/
Rights:: Not specified

2. A simplified front-end for SemTi-Kamols morphological analyser

Publisher:: Institute of Mathematics and Computer Science, University of Latvia
Type:: toolService
Subject:: morphological analyzer
Language:: Latvian
Description:: A simplified front-end (in a form of a RESTful web service) of the SemTi-Kamols morphological analyzer. Mainly for demonstration purposes.
Rights:: Not specified

3. ABC - Language Identifier

Publisher:: Research Institute for Artificial Intelligence, Romanian Academy of Sciences
Type:: toolService
Description:: The application, developed in C#, automatically identifies the language of a text written in one of the 21 European Union languages. By using training texts in different languages (approx. 1.5Mb of text for each language), a training module counts the prefixes (the first 3 characters) and the suffixes (4 characters endings) for all the words in the texts, for each language. For every language two models are constructed, containing the weights (percentages) of prefixes and suffixes in the texts representing a language. In the prediction phase, for a new text, two models are built on the fly in a similar manner. These models are then compared with the stored models representing each language for which the application was trained. Using comparison functions, the best model is chose. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2008). RACAI's Linguistic Web Services. In Proceedings of the 6th Language Resources and Evaluation Conference - LREC 2008, Marrakech, Morocco, May 2008. ELRA - European Language Resources Association. ISBN 2-9517408-4-0. -- Dan Tufiş and Alexandru Ceauşu (2007). Diacritics Restoration in Romanian Texts. In Elena Paskaleva and Milena Slavcheva (eds.), A Common Natural Language Processing Paradigm for Balkan Languages - RANLP 2007 Workshop Proceedings, pp. 49-56, Borovets, Bulgaria, September 2007. INCOMA Ltd., Shoumen, Bulgaria. ISBN 978-954-91743-8-0. -- Dan Tufiş and Adrian Chiţu (1999). Automatic Insertion of Diacritics in Romanian Texts. In Ferenc Kiefer, Gábor Kiss, and Júlia Pajzs (eds.), Proceedings of the 5th International Workshop on Computational Lexicography (COMPLEX 1999), pp. 185-194, Pecs, Hungary, May 1999. Linguistics Institute, Hungarian Academy of Sciences.
Rights:: Not specified

4. Access rights Management System

Publisher:: Max Planck Institute for Psycholinguistics
Type:: toolService
Description:: A tool to grant and deny the access to (parts of) an IMDI-based corpus. Support for advanced settings like ACLs.
Rights:: Not specified

5. Álgu – Origins of Saami Words

Publisher:: The Research Institute for the Languages of Finland
Type:: lexicalConceptualResource
Language:: Northern Sami
Description:: The database will contain an etymological lexicon of Saami languages complete with detailed source citations. The database will be open to the public in November 2006 and will be updated regularly.
Rights:: Not specified

6. Álgu – Origins of Saami Words (Álgu – Saamen sanojen etymologinen tietokanta)

Type:: lexicalConceptualResource
Description:: 70,000 words, over 100,000 etymological relations, Relational database
Rights:: Not specified

7. Alpino Treebank

Publisher:: Center for Language and Cognition
Format:: application/xml
Type:: corpus
Language:: Dutch
Description:: A database of 7.000 syntactically analyzed Dutch sentences.
Rights:: Not specified

8. ALTWEB

Type:: corpus
Language:: Italian
Description:: Dialect (Tuscan); 380.000 entries; written; DBT tagset
Rights:: Not specified

9. Amara - universal subtitles

Type:: corpus
Language:: Arabic, Danish, Dutch, English, German, Modern Greek (1453-), Italian, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish
Description:: Large set of subtitles available for download in multiple languages. Can be used as parallel corpus.
Rights:: Not specified

10. Anglos-Saxon charters

Publisher:: King's College London
Format:: application/tei+xml
Type:: corpus
Language:: English
Description:: Charters written in Anglo-Saxon England before A.D. 900, marked-up in TEI XML. Browsable online.
Rights:: Not specified

« Previous
Next »
1
2
3
4
5
…
58
59

1. A Gold Standard Word Alignment for English-Swedish

2. A simplified front-end for SemTi-Kamols morphological analyser

3. ABC - Language Identifier

4. Access rights Management System

5. Álgu – Origins of Saami Words

6. Álgu – Origins of Saami Words (Álgu – Saamen sanojen etymologinen tietokanta)

7. Alpino Treebank

8. ALTWEB

9. Amara - universal subtitles

10. Anglos-Saxon charters

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from