1 - 6 of 6
Number of results to display per page
Search Results
2. Arabic ACL corpus
- Creator:
- Salah Elfahal Elebaed, Hoyam, Kasbi, Mohammed, Nasri, Mohammed, and Bouzoubaa, Karim
- Publisher:
- International Journal of Computer Science Trends and Technology (IJCST)
- Type:
- text and corpus
- Subject:
- Controlled Natural Language, Arabic CNL, ACL, Arabic Corpus, and and TEI.
- Language:
- Arabic
- Description:
- This corpus constitutes all sentences representing the Arabic Controlled Language (ACL). It contains 551 sentences taken from four textbooks and websites dedicated to teach Arabic language to kids such as: a) First grade book, Republic of Sudan (كتاب الصف الاول جمهورية السودان), b) Al Jazeera Educational Site (موقع الجزيرة التعليمي), c) Bella Preparatory School Girls Forum (منتدى مدرسة بيلا الاعدادية بنات), and d) Albahr website (موقع انا البحر). These sentences are respecting 52 ACL rules. The average number of sentences for each rule is 10.6. All sentences in the corpus were analyzed by Farasa syntactic parser to confirm they are correctly analyzed. The validity of the parsing was done manually by linguist experts. The structure of this corpus is made of a header and a body. The header consists of a set of metadata that describe the corpus, such as the corpus name, the authors, the sources and further meta data. While the header is made of metadata, the body contains rules. Each rule has a code, a structure and all sentences respecting that rule. For each sentence, we store an id, the vowelledand unvowelled text as well as the result of parsing using Farasa.
- Rights:
- Not specified
3. CorpusExplorer
- Creator:
- Rüdiger, Jan Oliver
- Publisher:
- Jan Oliver Rüdiger
- Type:
- tool and toolService
- Subject:
- Corpus Linguisitics, NLP, conll, tei, XML, nlp, Natural Language Processing, linguistics, Linguistics, Computational Linguistics, corpus processing, tagger, POS tagger, lemmatization, text cleaning, CommonCrawl, epub, JSON, Twitter, Pandoc, Wikipedia, digital data, DTA, DSpin, MySQL, ElasticSearch, TextGrid, text corpora, TigerXML, and WeblichtXML
- Language:
- German, English, French, Italian, Dutch, Spanish, Polish, Arabic, Chinese, and Portuguese
- Description:
- Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks such as text acquisition, cleaning or tagging are completely automated. The simple interface supports the use in university teaching and leads users/students to fast and substantial results. The CorpusExplorer is open for many standards (XML, CSV, JSON, R, etc.) and also offers its own software development kit (SDK). Source code available at https://github.com/notesjor/corpusexplorer2.0
- Rights:
- Not specified
4. Dutch Bilingualism Data Base (DBD)
- Publisher:
- Radboud University Nijmegen, Max Planck Institute for Psycholinguistics, Meertens Institute KNAW The Netherlands, and Babylon Centre for Studies of Multilingualism in the Multicultural Society
- Type:
- corpus
- Language:
- Arabic, Dutch, and Turkish
- Description:
- Audio recordings, transcripts,
- Rights:
- Not specified
5. JIRS
- Publisher:
- Grid and High Performance Computing Group, ITACA, Universidad Politécnica de Valencia and Universidad de Alicante
- Type:
- toolService
- Language:
- Arabic, English, French, Italian, Oromo, and Urdu
- Description:
- JIRS is a Passage Retrieval system specially suited for Question Answering. It could be adapted to others languages very easily. ask (Written Language): Information Retrieval Applications Question/Answering Environment: OS-independent Access: GPLv3
- Rights:
- Not specified
6. OrienTel Telephone databases
- Type:
- corpus
- Subject:
- Multilingual access to interactive communication services for the Mediterranean and the Middle East
- Language:
- Modern Greek (1453-), Turkish, Arabic, and Hebrew
- Description:
- Collection of telephone databases from mediterranean region, incl. (variants of) Arabic. 500-1000 speakers per database, all orthographically transcribed. Speaker information regarding gender, age and accent. Phonetic lexicons included.
- Rights:
- Not specified