Original context has metadata only: false / Rights: http://creativecommons.org/publicdomain/zero/1.0/

Creator:: Koehn, Philipp, Heafield, Kenneth, Forcada, Mikel L., Esplà-Gomis, Miquel, Ortiz-Rojas, Sergio, Sánchez, Gema Ramírez, Cartagena, Víctor M. Sánchez, Haddow, Barry, Bañón, Marta, Střelec, Marek, Samiotou, Anna, and Kamran, Amir
Publisher:: ParaCrawl
Type:: text and corpus
Subject:: ParaCrawl, parallel corpus, CommonCrawl, machine translation, and text corpora
Language:: English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Czech, Romanian, Finnish, Latvian, Russian, and Estonian
Description:: The January 2018 release of the ParaCrawl is the first version of the corpus. It contains parallel corpora for 11 languages paired with English, crawled from a large number of web sites. The selection of websites is based on CommonCrawl, but ParaCrawl is extracted from a brand new crawl which has much higher coverage of these selected websites than CommonCrawl. Since the data is fairly raw, it is released with two quality metrics that can be used for corpus filtering. An official "clean" version of each corpus uses one of the metrics. For more details and raw data download please visit: http://paracrawl.eu/releases.html
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

14. ParCzech 3.0

Creator:: Kopp, Matyáš, Stankov, Vladislav, Bojar, Ondřej, Hladká, Barbora, and Straňák, Pavel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: audio and corpus
Subject:: Parliament of the Czech Republic, Chamber of Deputies, stenographic protocols, TEI encoding, and speech corpus
Language:: Czech
Description:: The ParCzech 3.0 corpus is the third version of ParCzech consisting of stenographic protocols that record the Chamber of Deputies’ meetings held in the 7th term (2013-2017) and the current 8th term (2017-Mar 2021). The protocols are provided in their original HTML format, Parla-CLARIN TEI format, and the format suitable for Automatic Speech Recognition. The corpus is automatically enriched with the morphological, syntactic, and named-entity annotations using the procedures UDPipe 2 and NameTag 2. The audio files are aligned with the texts in the annotated TEI files.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

15. ParCzech 4.0

Creator:: Kopp, Matyáš
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: Parliament of the Czech Republic, Chamber of Deputies, stenographic protocols, TEI encoding, and speech corpus
Language:: Czech
Description:: The ParCzech 4.0 corpus consists of stenographic protocols that record the Chamber of Deputies' meetings in the 7th term (2013-2017), the 8th term (2017-2021) and the current 9th term (2021-Jul 2023). The protocols are provided in their original HTML format, Parla-CLARIN TEI format. The corpus is automatically enriched with the morphological, syntactic, and named-entity annotations using the procedures UDPipe 2 and NameTag 2. The audio files are aligned with the texts in the annotated TEI files. The audio files in this corpus are available in AudioPSP 24.01 corpus (http://hdl.handle.net/11234/1-5404). This corpus covers the same period as ParlaMint-CZ corpus v4.0 (http://hdl.handle.net/11356/1860). ParCzech corpus follows and extends the ParlaMint schema. Both annotated and non-annotated versions include hypertext references to voting and parliamentary prints. In addition to ParlaMint's recommendation, the annotated version contains source audio alignment, PDT xtag, and more detailed CNEC2.0 named entity categorization.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

16. ParCzech PS7 1.0

Creator:: Hladká, Barbora, Kopp, Matyáš, and Straňák, Pavel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: Parliament of the Czech Republic, Chamber of Deputies, stenographic protocols, TEI encoding, and TEITOK
Language:: Czech
Description:: The ParCzech PS7 1.0 corpus is the very first member of the corpus family of data coming from the Parliament of the Czech Republic. ParCzech PS7 1.0 consists of stenographic protocols that record the Chamber of Deputies' meetings held in the 7th term between 2013-2017. The audio recordings are available as well. Transcripts are provided in the original HTML as harvested, and also converted into TEI-derived XML format for use in TEITOK corpus manager. The corpus is automatically enriched with the morphological and named-entity annotations using the procedures MorphoDita and NameTag.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

17. ParCzech PS7 2.0

Creator:: Hladká, Barbora, Kopp, Matyáš, and Straňák, Pavel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: Parliament of the Czech Republic, Chamber of Deputies, stenographic protocols, TEI encoding, and TEITOK
Language:: Czech
Description:: The ParCzech PS7 2.0 corpus is the second version of ParCzech PS7 consisting of stenographic protocols that record the Chamber of Deputies' meetings held in the 7th term between 2013-2017. The protocols are provided in their original HTML format, TEI format and TEI-derived format to make them searchable in the TEITOK corpus manager. Their audio recordings are available as well. The corpus is automatically enriched with the morphological, syntactic, and named-entity annotations using the procedures UDPipe 2 and NameTag 2.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

18. Semantic Features and Their Role In Conceptual Representation In School Age Children

Creator:: Konečná, Kristýna
Publisher:: Palacky University, Department of General Linguistics
Type:: text, other, and lexicalConceptualResource
Subject:: semantic features, concepts, semantic categories, children laguage, language development, concept, semantic feature, school age children, and czech children
Language:: Czech
Description:: Language acquisition is one of the currently much discussed topics in the field of psycholinguistics. Considerable space for future research can be seen in the development of vocabulary in Czech-speaking children. In our case, we are mainly interested in the meaning, i.e. the content of acquired words (concepts), and the role of so-called semantic features in mental representation. The intended goal of our research is to bring new information from the above-mentioned area, to confirm or disprove some existing theoretical statements and to compare the results of foreign research with data obtained using the Czech language material. Similar research has been conducted in various world languages, but so far there are not many papers that address the issue in the Czech language environment. As part of our work, a comprehensive database of semantic features for selected concepts has been prepared. This database has been statistically processed and subsequently the data has been analyzed and interpreted on the basis of theories about the development of the child's speech competence. This material, obtained from children aged 8-9 (lower primary school) growing up in a Czech language environment, has been used in the next phase of research, in which an experiment with subjects belonging to the same age category has been performed: in a semantic task based on the phenomenon called semantic priming, the effect of featural similarity of two concepts on decision in a speeded task has been observed. The results of the research expand the range of information published so far in this scientific field in the Czech environment. This research can provide valuable insights into children's language acquisition issues. The data gathered can also be practically beneficial not only for teachers, psychologists and speech therapists, but also for parents, for example.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

11. Naše Sedlčany

12. Naše Sedlčany

13. ParaCrawl Corpus version 1.0

14. ParCzech 3.0

15. ParCzech 4.0

16. ParCzech PS7 1.0

17. ParCzech PS7 2.0

18. Semantic Features and Their Role In Conceptual Representation In School Age Children

Limit your search

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Coverage

Creator

Show values starting with

Language

Show values starting with

Publisher

Rights

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from