Harvested from: LINDAT/CLARIAH-CZ repository / Language: Finnish - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Language Finnish Harvested from LINDAT/CLARIAH-CZ repository

11. Comparable Russian-Finnish corpus of juridical texts

Publisher:: University of Tampere
Format:: application/octet-stream
Type:: corpus
Language:: Finnish and Russian
Description:: Juridical texts in Russian and Finnish arranged as a comparable text corpus
Rights:: Not specified

12. CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data

Creator:: Zeman, Daniel and Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: tokenization, word segmentation, morphology, tagging, syntax, parsing, and universal dependencies
Language:: Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
Description:: CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to the participating systems: raw text files and files preprocessed by UDPipe. The metadata.json files contain lists of files to process and to output; README files in the respective folders describe the syntax of metadata.json. For full training, development and gold standard test data, see Universal Dependencies 2.0 (CoNLL 2017) Universal Dependencies 2.2 (CoNLL 2018) See the download links at http://universaldependencies.org/. For more information on the shared tasks, see http://universaldependencies.org/conll17/ http://universaldependencies.org/conll18/ Contents: conll17-ud-test-2017-05-09 ... CoNLL 2017 test data conll18-ud-test-2018-05-06 ... CoNLL 2018 test data conll18-ud-test-2018-05-06-for-conll17 ... CoNLL 2018 test data with metadata and filenames modified so that it is digestible by the 2017 systems.
Rights:: Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB

13. CoNLL 2017 Shared Task System Outputs

Creator:: Zeman, Daniel, Potthast, Martin, Straka, Milan, Popel, Martin, Dozat, Timothy, Qi, Peng, Manning, Christopher, Shi, Tianze, Wu, Felix G., Chen, Xilun, Cheng, Yao, Björkelund, Anders, Falenska, Agnieszka, Yu, Xiang, Kuhn, Jonas, Che, Wanxiang, Guo, Jiang, Wang, Yuxuan, Zheng, Bo, Zhao, Huaipeng, Liu, Yang, Teng, Dechuan, Liu, Ting, Lim, Kyungtae, Poibeau, Thierry, Sato, Motoki, Manabe, Hitoshi, Noji, Hiroshi, Matsumoto, Yuji, Kırnap, Ömer, Önder, Berkay Furkan, Yuret, Deniz, Straková, Jana, Vania, Clara, Zhang, Xingxing, Lopez, Adam, Heinecke, Johannes, Asadullah, Munshi, Kanerva, Jenna, Luotolahti, Juhani, Ginter, Filip, Kuan, Yu, Sofroniev, Pavel, Schill, Erik, Hinrichs, Erhard, Nguyen, Dat Quoc, Dras, Mark, Johnson, Mark, Qian, Xian, Vilares, David, Gómez-Rodríguez, Carlos, Aufrant, Lauriane, Wisniewski, Guillaume, Yvon, François, Dumitrescu, Stefan Daniel, Boroş, Tiberiu, Tufiş, Dan, Das, Ayan, Zaffar, Affan, Sarkar, Sudeshna, Wang, Hao, Zhao, Hai, Zhang, Zhisong, Hornby, Ryan, Taylor, Clark, Park, Jungyeul, de Lhoneux, Miryam, Shao, Yan, Basirat, Ali, Kiperwasser, Eliyahu, Stymne, Sara, Goldberg, Yoav, Nivre, Joakim, Akkuş, Burak Kerim, Azizoglu, Heval, Cakici, Ruket, Moor, Christophe, Merlo, Paola, Henderson, James, Wang, Haozhou, Ji, Tao, Wu, Yuanbin, Lan, Man, de la Clergerie, Eric, Sagot, Benoît, Seddah, Djamé, More, Amir, Tsarfaty, Reut, Kanayama, Hiroshi, Muraoka, Masayasu, Yoshikawa, Katsumasa, Garcia, Marcos, and Gamallo, Pablo
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: dependency parser and parsebank
Language:: Arabic, Bulgarian, Russia Buriat, Czech, Catalan, Church Slavic, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, French, Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Swedish, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
Description:: This package contains the system outputs from the CoNLL 2017 Shared Task in Multilingual Parsing from Raw Text to Universal Dependencies.
Rights:: Licence Universal Dependencies v2.0, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.0, and PUB

14. CoNLL 2018 Shared Task System Outputs

Creator:: Zeman, Daniel, Potthast, Martin, Duthoo, Elie, Mesnard, Olivier, Rybak, Piotr, Wróblewska, Alina, Che, Wanxiang, Liu, Yijia, Wang, Yuxuan, Zheng, Bo, Liu, Ting, Li, Zuchao, He, Shexia, Zhang, Zhuosheng, Zhao, Hai, Wu, Yingting, Tong, Jia-Jun, Nguyen, Dat Quoc, Verspoor, Karin, Wan, Hui, Naseem, Tahira, Lee, Young-Suk, Castelli, Vittorio, Ballesteros, Miguel, Hershcovich, Daniel, Abend, Omri, Rappoport, Ari, Smith, Aaron, Bohnet, Bernd, de Lhoneux, Miryam, Nivre, Joakim, Shao, Yan, Stymne, Sara, Kırnap, Ömer, Dayanık, Erenay, Yuret, Deniz, Kanerva, Jenna, Ginter, Filip, Miekka, Niko, Leino, Akseli, Salakoski, Tapio, Lim, KyungTae, Park, Cheoneum, Lee, Changki, Poibeau, Thierry, Bhat, Riyaz Ahmad, Bhat, Irshad, Bangalore, Srinivas, Qi, Peng, Dozat, Timothy, Zhang, Yuhao, Manning, Christopher, Boroș, Tiberiu, Dumitrescu, Stefan Daniel, Burtica, Ruxandra, Arakelyan, Gor, Hambardzumyan, Karen, Khachatrian, Hrant, Rosa, Rudolf, Mareček, David, Straka, Milan, Seker, Amit, More, Amir, Tsarfaty, Reut, Önder, Berkay Furkan, Gümeli, Can, Jawahar, Ganesh, Muller, Benjamin, Fethi, Amal, Martin, Louis, Villemonte de la Clergerie, Eric, Sagot, Benoît, Seddah, Djamé, Özateş, Şaziye Betül, Özgür, Arzucan, Gungor, Tunga, Öztürk, Balkız, Ji, Tao, Liu, Yufang, Wang, Yijun, Wu, Yuanbin, Lan, Man, Chen, Danlu, Lin, Mengxiao, Hu, Zhifeng, and Qiu, Xipeng
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: parsed data, conllu, and universal dependencies
Language:: Afrikaans, Arabic, Breton, Bulgarian, Russia Buriat, Catalan, Czech, Church Slavic, Danish, German, Modern Greek (1453-), English, Estonian, Basque, Faroese, Persian, Finnish, French, Old French (842-ca. 1400), Irish, Galician, Gothic, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Upper Sorbian, Hungarian, Armenian, Indonesian, Italian, Japanese, Kazakh, Northern Kurdish, Korean, Latin, Latvian, Dutch, Norwegian, Nigerian Pidgin, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Northern Sami, Spanish, Serbian, Swedish, Thai, Turkish, Uighur, Ukrainian, Urdu, Vietnamese, and Chinese
Description:: Test data parsed by systems submitted to the CoNLL 2018 UD parsing shared task.
Rights:: Licence Universal Dependencies v2.2, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2, and PUB

15. Corpus of Early Literary Finnish

Publisher:: The Research Institute for the Languages of Finland
Type:: corpus
Language:: Finnish
Description:: This corpus contains a variety of works written in Finnish published between 1809 and 1899, such as newspapers, periodicals, almanacs, and decrees. The corpus contains 8,976,561 words and is available for online browsing.
Rights:: Not specified

16. Corpus of Finnish Literary Classics

Publisher:: The Research Institute for the Languages of Finland
Type:: corpus
Language:: Finnish
Description:: The classics of Finnish literature corpus contains works by established Finnish fiction writers from the 1880s to the 1930s. The corpus is part of speech tagged and available for online browsing via the concordancer Korp.
Rights:: Not specified

17. Corpus of Old Literary Finnish

Publisher:: The Research Institute for the Languages of Finland
Type:: corpus
Language:: Finnish
Description:: This is a linguistically unannotated corpus of various historical texts written between 1543 and 1809. The corpus consists of 3,428,618 words and is available for online browsing.
Rights:: Not specified

18. Corpus of Proverbs and Other Colloquial Expressions

Publisher:: The Research Institute for the Languages of Finland
Type:: corpus
Language:: Finnish
Rights:: Not specified

19. DaMuEL 1.0: A Large Multilingual Dataset for Entity Linking

Creator:: Kubeša, David and Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: entity linking, NEL, NER, dataset, and knowledge base
Language:: Afrikaans, Arabic, Armenian, Basque, Belarusian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Latin, Latvian, Lithuanian, Maltese, Marathi, Modern Greek (1453-), Northern Sami, Norwegian Nynorsk, Persian, Polish, Portuguese, Romanian, Russian, Scottish Gaelic, Serbian, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, Uighur, Ukrainian, Urdu, Vietnamese, and Wolof
Description:: We present DaMuEL, a large Multilingual Dataset for Entity Linking containing data in 53 languages. DaMuEL consists of two components: a knowledge base that contains language-agnostic information about entities, including their claims from Wikidata and named entity types (PER, ORG, LOC, EVENT, BRAND, WORK_OF_ART, MANUFACTURED); and Wikipedia texts with entity mentions linked to the knowledge base, along with language-specific text from Wikidata such as labels, aliases, and descriptions, stored separately for each language. The Wikidata QID is used as a persistent, language-agnostic identifier, enabling the combination of the knowledge base with language-specific texts and information for each entity. Wikipedia documents deliberately annotate only a single mention for every entity present; we further automatically detect all mentions of named entities linked from each document. The dataset contains 27.9M named entities in the knowledge base and 12.3G tokens from Wikipedia texts. The dataset is published under the CC BY-SA licence.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

20. Database of etymological references(Etymologinen viitetietokanta)

Type:: lexicalConceptualResource
Language:: Finnish
Description:: 128,000 word forms. Database
Rights:: Not specified

« Previous
Next »
1
2
3
4
5
6
…
9
10

11. Comparable Russian-Finnish corpus of juridical texts

12. CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data

13. CoNLL 2017 Shared Task System Outputs

14. CoNLL 2018 Shared Task System Outputs

15. Corpus of Early Literary Finnish

16. Corpus of Finnish Literary Classics

17. Corpus of Old Literary Finnish

18. Corpus of Proverbs and Other Colloquial Expressions

19. DaMuEL 1.0: A Large Multilingual Dataset for Entity Linking

20. Database of etymological references(Etymologinen viitetietokanta)

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from