Original context has metadata only: true / Rights: Not specified - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Rights Not specified Original context has metadata only true Date 2000 to 2024

1. A Gold Standard Word Alignment for English-Swedish

Creator:: Ahrenberg, Lars and Holmqvist, Maria
Publisher:: Linköping University
Type:: text, wordList, and lexicalConceptualResource
Subject:: word alignment
Language:: Swedish and English
Description:: A Gold Standard Word Alignment for English-Swedish (GES) is a resource containing 1164 manually word aligned sentences pairs from English and Swedish versions of Europarl v. 2. The data can be found here: https://www.ida.liu.se/labs/nlplab/ges/
Rights:: Not specified

2. Alpino Treebank

Publisher:: Center for Language and Cognition
Format:: application/xml
Type:: corpus
Language:: Dutch
Description:: A database of 7.000 syntactically analyzed Dutch sentences.
Rights:: Not specified

3. Anglos-Saxon charters

Publisher:: King's College London
Format:: application/tei+xml
Type:: corpus
Language:: English
Description:: Charters written in Anglo-Saxon England before A.D. 900, marked-up in TEI XML. Browsable online.
Rights:: Not specified

4. Arabic ACL corpus

Creator:: Salah Elfahal Elebaed, Hoyam, Kasbi, Mohammed, Nasri, Mohammed, and Bouzoubaa, Karim
Publisher:: International Journal of Computer Science Trends and Technology (IJCST)
Type:: text and corpus
Subject:: Controlled Natural Language, Arabic CNL, ACL, Arabic Corpus, and and TEI.
Language:: Arabic
Description:: This corpus constitutes all sentences representing the Arabic Controlled Language (ACL). It contains 551 sentences taken from four textbooks and websites dedicated to teach Arabic language to kids such as: a) First grade book, Republic of Sudan (كتاب الصف الاول جمهورية السودان), b) Al Jazeera Educational Site (موقع الجزيرة التعليمي), c) Bella Preparatory School Girls Forum (منتدى مدرسة بيلا الاعدادية بنات), and d) Albahr website (موقع انا البحر). These sentences are respecting 52 ACL rules. The average number of sentences for each rule is 10.6. All sentences in the corpus were analyzed by Farasa syntactic parser to confirm they are correctly analyzed. The validity of the parsing was done manually by linguist experts. The structure of this corpus is made of a header and a body. The header consists of a set of metadata that describe the corpus, such as the corpus name, the authors, the sources and further meta data. While the header is made of metadata, the body contains rules. Each rule has a code, a structure and all sentences respecting that rule. For each sentence, we store an id, the vowelledand unvowelled text as well as the result of parsing using Farasa.
Rights:: Not specified

5. Atlas of Place Names

Publisher:: The Research Institute for the Languages of Finland
Type:: toolService
Language:: Finnish
Description:: The digital atlas illustrates the distribution of 234 common Finnish place-name elements based on data in the Names Archive.
Rights:: Not specified

6. Berliner Wendekorpus

Publisher:: Berlin-Brandenburg Academy of Sciences and Humanities
Format:: application/tei+xml
Type:: corpus
Language:: German
Description:: Transcribed narrative interviews with people from East and West Berlin about the events of November 9. 282,000 tokens. TEI XML, lemma and POS. Normalized version also available.
Rights:: Not specified

7. British academic spoken English (BASE) corpus

Publisher:: Coventry University, University of Reading, University of Warwick
Format:: application/tei+xml
Type:: corpus
Language:: English
Description:: Transcribed recordings of 160 lectures and 39 seminars held in university departments. Four broad disciplinary groups, 1,644,942 tokens in total.
Rights:: Not specified

8. CAST corpus (Computer-Aided Summarisation Tool)

Publisher:: Research Group in Computational Linguistics, University of Wolverhampton
Type:: corpus
Language:: English
Description:: Sentences annotated for important units of text for summarisation. 145,473 words / 6584 sentences
Rights:: Not specified

9. Česílko 2.0 Shallow Transfer RBMT framework (opensource version)

Creator:: Vičič, Jernej, Kuboň, Vladislav, and Homola, Petr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: Shallow Parse, Shallow Transfer Rule-Based Machine Translation, stochastic ranker, related languages, and toolbox
Description:: The system Česílko (language data and software tools) was first developed as an answer to a growing need of translation and localisation from one source language to many target languages. The starting system belonged to the Shallow Parse, Shallow Transfer Rule-Based Machine Translation – (RBMT) paradigm and it was designed primarily for translation of related languages. The latest implementation of the system uses a stochastic ranker; so technically it belongs to the hybrid machine translation paradigm, using stochastic methods combined with the traditional Shallow Transfer RBMT methods. The system has been stripped of the accompanying language resources due to copyright restrictions. The data that is available is just for demonstrative purposes.
Rights:: Not specified

10. Code-switching conversation corpus

Publisher:: Max Planck Institute for Psycholinguistics
Type:: corpus
Language:: Dutch
Description:: The code-switching corpus consists of 5x30-minute conversations between four speakers (i.e. a total of 20 speakers). The speakers are bilingual speakers of Papiamento (a creole langauge spoken in the Dutch Antilles) and Dutch. In the course of their free conversations, they engage in code-switching, that is, they use both languages within the same utterance in systematic ways. The corpus is fully transcribed and glossed, coded for language and word class, in ELAN.
Rights:: Not specified

« Previous
Next »
1
2
3
4
5
…
7
8

1. A Gold Standard Word Alignment for English-Swedish

2. Alpino Treebank

3. Anglos-Saxon charters

4. Arabic ACL corpus

5. Atlas of Place Names

6. Berliner Wendekorpus

7. British academic spoken English (BASE) corpus

8. CAST corpus (Computer-Aided Summarisation Tool)

9. Česílko 2.0 Shallow Transfer RBMT framework (opensource version)

10. Code-switching conversation corpus

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from