This corpus constitutes all sentences representing the Arabic Controlled Language (ACL). It contains 551 sentences taken from four textbooks and websites dedicated to teach Arabic language to kids such as: a) First grade book, Republic of Sudan (كتاب الصف الاول جمهورية السودان), b) Al Jazeera Educational Site (موقع الجزيرة التعليمي), c) Bella Preparatory School Girls Forum (منتدى مدرسة بيلا الاعدادية بنات), and d) Albahr website (موقع انا البحر). These sentences are respecting 52 ACL rules. The average number of sentences for each rule is 10.6. All sentences in the corpus were analyzed by Farasa syntactic parser to confirm they are correctly analyzed. The validity of the parsing was done manually by linguist experts.
The structure of this corpus is made of a header and a body. The header consists of a set of metadata that describe the corpus, such as the corpus name, the authors, the sources and further meta data. While the header is made of metadata, the body contains rules. Each rule has a code, a structure and all sentences respecting that rule. For each sentence, we store an id, the vowelledand unvowelled text as well as the result of parsing using Farasa.
The database contains audio and video material related to traditional culture - songs, folktales, legends, life stories and various collective or individual folklore related performances. The content has been either specifically contributed to the Archives of Latvian Folklore or collected by its staff members.
The Audio Recordings Archive (Suomen kielen nauhoitearkisto) holds over 23,000 hours of recordings collected since 1959, providing authentic samples of Finnish dialects, languages related to Finnish, and other world languages. The collection additionally includes samples of Finnish dialects spoken in Sweden, Norway, Ingria, the United States and Australia. Digitisation of the audio bank was undertaken in 1999. Over half of its content has been digitised, totalling about 13,000 hours of recordings.
The database consists of three sets: - Many Talker Set: 30 males, 30 females; each to read 50 numbers, 1-2 connected passages, 1 block of "filler" sentences, and 1 block of syllables. - Few Talker Set: 4 males, 4 females; each to read 50 numbers, 10 connected passages, 1 block of "filler" sentences, and 2-3 blocks of syllables. - Very Few Talker Set: 1 male, 1 female; each to read 2 blocks of 50 numbers, 40 connected passages, 4 blocks of "filler" sentences, and 9 blocks of syllables. Total amount ca 12 hours of speech.