Creator: Jawaid, Bushra / Language: Urdu / Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)

Start Over Creator Jawaid, Bushra Language Urdu Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)

1. English-Urdu Religious Parallel Corpus

Creator:: Jawaid, Bushra and Zeman, Daniel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: parallel corpus, religious text, and machine translation
Language:: English and Urdu
Description:: English-Urdu parallel corpus is a collection of religious texts (Quran, Bible) in English and Urdu language with sentence alignments. The corpus can be used for experiments with statistical machine translation. Our modifications of crawled data include but are not limited to the following: 1- Manually corrected sentence alignment of the corpora. 2- Our data split (training-development-test) so that our published experiments can be reproduced. 3- Tokenization (optional, but needed to reproduce our experiments). 4- Normalization (optional) of e.g. European vs. Urdu numerals, European vs. Urdu punctuation, removal of Urdu diacritics.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

2. Urdu Monolingual Corpus

Creator:: Jawaid, Bushra, Kamran, Amir, and Bojar, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, corpus, other, and lexicalConceptualResource
Subject:: Urdu, monolingual data, annotated data, and corpus
Language:: Urdu
Description:: We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger. We run this complex ensemble on a large monolingual corpus and release the both plain and tagged corpora. and it is supported by the MosesCore project sponsored by the European Commission’s Seventh Framework Programme (Grant Number 288487).
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

Search

Search Constraints

Search Results

Limit your search

Contributor

Creator

Language

Publisher

Rights

Subject

Type

Date

Original context has metadata only

Harvested from