« Previous |
1 - 10 of 15
|
Next »
Number of results to display per page
Search Results
2. Gewerkschaften in der Tschechoslowakei /
- Creator:
- Toman, Josef
- Type:
- text and studie
- Subject:
- Dějiny Česka a Slovenska, družstva, družstevnictví, politika hospodářská, Československo 1945-1992, and vnitřní politika
- Language:
- German
- Rights:
- unknown
3. Prague Czech-English Dependency Treebank 2.0
- Creator:
- Hajič, Jan, Hajičová, Eva, Panevová, Jarmila, Sgall, Petr, Cinková, Silvie, Fučíková, Eva, Mikulová, Marie, Pajas, Petr, Popelka, Jan, Semecký, Jiří, Šindlerová, Jana, Štěpánek, Jan, Toman, Josef, Urešová, Zdeňka, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- parallel treebank, PCEDT, parallel corpus, Wall Street Journal, WSJ, Penn Treebank, dependency annotation, and PDT
- Language:
- Czech and English
- Description:
- Texts The Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) is a major update of the Prague Czech-English Dependency Treebank 1.0 (LDC2004T25). It is a manually parsed Czech-English parallel corpus sized over 1.2 million running words in almost 50,000 sentences for each part. Data The English part contains the entire Penn Treebank - Wall Street Journal Section (LDC99T42). The Czech part consists of Czech translations of all of the Penn Treebank-WSJ texts. The corpus is 1:1 sentence-aligned. An additional automatic alignment on the node level (different for each annotation layer) is part of this release, too. The original Penn Treebank-like file structure (25 sections, each containing up to one hundred files) has been preserved. Only those PTB documents which have both POS and structural annotation (total of 2312 documents) have been translated to Czech and made part of this release. Each language part is enhanced with a comprehensive manual linguistic annotation in the PDT 2.0 style (LDC2006T01, Prague Dependency Treebank 2.0). The main features of this annotation style are: dependency structure of the content words and coordinating and similar structures (function words are attached as their attribute values) semantic labeling of content words and types of coordinating structures argument structure, including an argument structure ("valency") lexicon for both languages ellipsis and anaphora resolution. This annotation style is called tectogrammatical annotation and it constitutes the tectogrammatical layer in the corpus. For more details see below and documentation. Annotation of the Czech part Sentences of the Czech translation were automatically morphologically annotated and parsed into surface-syntax dependency trees in the PDT 2.0 annotation style. This annotation style is sometimes called analytical annotation; it constitutes the analytical layer of the corpus. The manual tectogrammatical (deep-syntax) annotation was built as a separate layer above the automatic analytical (surface-syntax) parse. A sample of 2,000 sentences was manually annotated on the analytical layer. Annotation of the English part The resulting manual tectogrammatical annotation was built above an automatic transformation of the original phrase-structure annotation of the Penn Treebank into surface dependency (analytical) representations, using the following additional linguistic information from other sources: PropBank (LDC2004T14) VerbNet NomBank (LDC2008T23) flat noun phrase structures (by courtesy of D. Vadas and J.R. Curran) For each sentence, the original Penn Treebank phrase structure trees are preserved in this corpus together with their links to the analytical and tectogrammatical annotation. and Ministry of Education of the Czech Republic projects No.: MSM0021620838 LC536 ME09008 LM2010013 7E09003+7E11051 7E11041 Czech Science Foundation, grants No.: GAP406/10/0875 GPP406/10/P193 GA405/09/0729 Research funds of the Faculty of Mathematics and Physics, Charles University, Czech Republic, Grant Agency of the Academy of Sciences of the Czech Republic: No. 1ET101120503 Students participating in this project have been running their own student grants from the Grant Agency of the Charles University, which were connected to this project. Only ongoing projects are mentioned: 116310, 158010, 3537/2011 Also, this work was funded in part by the following projects sponsored by the European Commission: Companions, No. 034434 EuroMatrix, No. 034291 EuroMatrixPlus, No. 231720 Faust, No. 247762
- Rights:
- CC-BY-NC-SA + LDC99T42, https://lindat.mff.cuni.cz/repository/xmlui/page/license-pcedt2, and RES
4. Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)
- Creator:
- Hajič, Jan, Bejček, Eduard, Bémová, Alevtina, Buráňová, Eva, Fučíková, Eva, Hajičová, Eva, Havelka, Jiří, Hlaváčová, Jaroslava, Homola, Petr, Ircing, Pavel, Kárník, Jiří, Kettnerová, Václava, Klyueva, Natalia, Kolářová, Veronika, Kučová, Lucie, Lopatková, Markéta, Mareček, David, Mikulová, Marie, Mírovský, Jiří, Nedoluzhko, Anna, Novák, Michal, Pajas, Petr, Panevová, Jarmila, Peterek, Nino, Poláková, Lucie, Popel, Martin, Popelka, Jan, Romportl, Jan, Rysová, Magdaléna, Semecký, Jiří, Sgall, Petr, Spoustová, Johanka, Straka, Milan, Straňák, Pavel, Synková, Pavlína, Ševčíková, Magda, Šindlerová, Jana, Štěpánek, Jan, Štěpánková, Barbora, Toman, Josef, Urešová, Zdeňka, Vidová Hladká, Barbora, Zeman, Daniel, Zikánová, Šárka, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- treebank, dependency, tectogrammatics, topic-focus articulation, multiword expressions, coreference, bridging relations, discourse, morphology, syntax, tokenization, lemmatization, semantic relations, lexical semantics, lexicon, valency, speech reconstruction, clauses, speech recognition, and spoken corpus
- Language:
- Czech
- Description:
- A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated release of the existing PDT-corpora of Czech data, uniformly annotated using the standard PDT scheme. PDT-corpora included in PDT-C: Prague Dependency Treebank (the original PDT contents, written newspaper and journal texts from three genres); Czech part of Prague Czech-English Dependency Treebank (translated financial texts, from English), Prague Dependency Treebank of Spoken Czech (spoken data, including audio and transcripts and multiple speech reconstruction annotation); PDT-Faust (user-generated texts). The difference from the separately published original treebanks can be briefly described as follows: it is published in one package, to allow easier data handling for all the datasets; the data is enhanced with a manual linguistic annotation at the morphological layer and new version of morphological dictionary is enclosed; a common valency lexicon for all four original parts is enclosed. Documentation provides two browsing and editing desktop tools (TrEd and MEd) and the corpus is also available online for searching using PML-TQ.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
5. VIADAT
- Creator:
- Böhm, Stanislav, Hajič, Jan, Srdečný, Vojtěch, Toman, Josef, and Košarko, Ondřej
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- infrastructure and toolService
- Subject:
- oral history, speech, and search
- Language:
- Czech
- Description:
- This component integrates other VIADAT modules; together with VIADAT-REPO this composes the Virtual Assistant for accessing historical audiovisual data. The zip archive contains sources for the following modules: VIADAT, VIADAT-DEPOSIT, VIADAT-TEXT, VIADAT-ANNOTATE, VIADAT-ANALYZE, VIADAT-STAT, VIADAT-GIS and VIADAT-SEARCH. Developed in cooperation with ÚSD AV ČR and NFA.
- Rights:
- BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB
6. VIADAT (2019-12-31)
- Creator:
- Böhm, Stanislav, Hajič, Jan, Srdečný, Vojtěch, Toman, Josef, and Košarko, Ondřej
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- infrastructure and toolService
- Subject:
- oral history, speech, and search
- Language:
- Czech
- Description:
- This component integrates other VIADAT modules; together with VIADAT-REPO this composes the Virtual Assistant for accessing historical audiovisual data. The zip archive contains sources for the following modules: VIADAT, VIADAT-DEPOSIT, VIADAT-TEXT, VIADAT-ANNOTATE, VIADAT-ANALYZE, VIADAT-STAT, VIADAT-GIS and VIADAT-SEARCH. Developed in cooperation with ÚSD AV ČR and NFA.
- Rights:
- BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB
7. VIADAT-ANALYZE
- Creator:
- Böhm, Stanislav, Hajič, Jan, Srdečný, Vojtěch, Toman, Josef, and Košarko, Ondřej
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- infrastructure and toolService
- Subject:
- oral history, speech, and search
- Language:
- Czech
- Description:
- A VIADAT module; VIADAT-ANALYZE is an interactive environment that enables the end user to work with stored, annotated and indexed audio recordings. Allowing visualization and extraction of results. Developed in cooperation with ÚSD AV ČR and NFA.
- Rights:
- BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB
8. VIADAT-ANALYZE (2019-12-31)
- Creator:
- Böhm, Stanislav, Hajič, Jan, Srdečný, Vojtěch, Toman, Josef, and Košarko, Ondřej
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- infrastructure and toolService
- Subject:
- oral history, speech, and search
- Language:
- Czech
- Description:
- A VIADAT module; VIADAT-ANALYZE is an interactive environment that enables the end user to work with stored, annotated and indexed audio recordings. Allowing visualization and extraction of results. Developed in cooperation with ÚSD AV ČR and NFA.
- Rights:
- BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB
9. VIADAT-ANNOTATE
- Creator:
- Böhm, Stanislav, Hajič, Jan, Srdečný, Vojtěch, Toman, Josef, and Košarko, Ondřej
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- infrastructure and toolService
- Subject:
- oral history, speech, and search
- Language:
- Czech
- Description:
- A VIADAT module; VIADAT-ANNOTATE is an interactive annotation environment. Developed in cooperation with ÚSD AV ČR and NFA.
- Rights:
- BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB
10. VIADAT-ANNOTATE (2019-12-31)
- Creator:
- Böhm, Stanislav, Hajič, Jan, Srdečný, Vojtěch, Toman, Josef, and Košarko, Ondřej
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- infrastructure and toolService
- Subject:
- oral history, speech, and search
- Language:
- Czech
- Description:
- A VIADAT module; VIADAT-ANNOTATE is an interactive annotation environment. Developed in cooperation with ÚSD AV ČR and NFA.
- Rights:
- BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB