Harvested from: LINDAT/CLARIAH-CZ repository / Type: corpus - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Type corpus Harvested from LINDAT/CLARIAH-CZ repository

131. Corpus of contemporary blogs

Creator:: Grác, Marek
Publisher:: Masaryk University, NLP Centre
Type:: text and corpus
Subject:: corpus, blogs, annotation, annotators, sentences, and machine learning
Language:: Czech
Description:: In NLP Centre, dividing text into sentences is currently done with a tool which uses rule-based system. In order to make enough training data for machine learning, annotators manually split the corpus of contemporary text CBB.blog (1 million tokens) into sentences. Each file contains one hundredth of the whole corpus and all data were processed in parallel by two annotators. The corpus was created from ten contemporary blogs: hintzu.otaku.cz modnipeklo.cz bloc.cz aleneprokopova.blogspot.com blog.aktualne.cz fuchsova.blog.onaidnes.cz havlik.blog.idnes.cz blog.aktualne.centrum.cz klusak.blogspot.cz myego.cz/welldone
Rights:: Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0), http://creativecommons.org/licenses/by-nc-nd/3.0/, and PUB

132. Corpus of Early English Correspondence Sampler (CEECS)

Publisher:: University of Helsinki
Format:: text/plain
Type:: corpus
Language:: English
Description:: Personal correspondence from England between the years 1418-1680. Compiled as a tool for historical sociolinguistics.
Rights:: Not specified

133. Corpus of Early Literary Finnish

Publisher:: The Research Institute for the Languages of Finland
Type:: corpus
Language:: Finnish
Description:: period: 1809-1899
Rights:: Not specified

134. Corpus of Finnish Literary Classics

Publisher:: The Research Institute for the Languages of Finland
Type:: corpus
Language:: Finnish
Description:: period: 1880s–1930s
Rights:: Not specified

135. Corpus of Italian Emblem Books

Publisher:: University of Glasgow
Type:: corpus
Language:: Italian
Description:: Italian emblem books from the Stirling Maxwell Collection (University of Glasgow). Transcribed text and photographi reproducitons. Searchable and browsable online
Rights:: Not specified

136. Corpus of Old Literary Finnish

Publisher:: The Research Institute for the Languages of Finland
Type:: corpus
Language:: Finnish
Description:: period: 1543–1809
Rights:: Not specified

137. Corpus of Old Written Estonian

Publisher:: University of Tartu
Type:: corpus
Language:: Estonian
Description:: Corpus of texts written fully or partly in Estonian, from 13.-19. century; 1,5 million words
Rights:: Not specified

138. Corpus of precisely articulated Czech speech

Creator:: Hanzlíček, Zdeněk, Kochová, Pavla, Tihelka, Daniel, Kövérová, Markéta, Matoušek, Jindřich, and Ševeček, Pavel
Publisher:: University of West Bohemia, Department of Cybernetics and Lingea, s.r.o.
Type:: audio and corpus
Subject:: speech corpus, text-to-speech (TTS), speech synthesis, and hyperarticulated speech
Language:: Czech
Description:: The corpus contains speech data of 2 Czech native speakers, male and female. The speech is very precisely articulated up to hyper-articulated, and the speech rate is low. The speech data with a highlighted articulation is suitable for teaching foreigners the Czech language, and it can also be used for people with hearing or speech impairment. The recorded sentences can be used either directly, e.g., as a part of educational material, or as source data for building complex educational systems incorporating speech synthesis technology. All recorded sentences were precisely orthographically annotated and phonetically segmented, i.e., split into phones, using modern neural network-based methods.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

139. Corpus of Present-day Written Estonian

Type:: corpus
Language:: Estonian
Description:: written general; 95 mio words; TEI/SGML
Rights:: Not specified

140. Corpus of Proverbs and Other Colloquial Expressions

Publisher:: The Research Institute for the Languages of Finland
Type:: corpus
Language:: Finnish
Rights:: Not specified

« Previous
Next »
1
2
…
10
11
12
13
14
15
16
17
18
…
74
75

131. Corpus of contemporary blogs

132. Corpus of Early English Correspondence Sampler (CEECS)

133. Corpus of Early Literary Finnish

134. Corpus of Finnish Literary Classics

135. Corpus of Italian Emblem Books

136. Corpus of Old Literary Finnish

137. Corpus of Old Written Estonian

138. Corpus of precisely articulated Czech speech

139. Corpus of Present-day Written Estonian

140. Corpus of Proverbs and Other Colloquial Expressions

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from