A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Comprehensive Arabic LEMmas is a lexicon covering a large list of Arabic lemmas and their corresponding inflected word forms (stems) with details (POS + Root). Each lexical entry represents a lemma followed by all its possible stems and each stem is enriched by its morphological features especially the root and the POS.
It is composed of 164,845 lemmas representing 7,200,918 stems, detailed as follow:
757 Arabic particles
2,464,631 verbal stems
4,735,587 nominal stems
The lexicon is provided as an LMF conformant XML-based file in UTF8 encoding, which represents about 1,22 Gb of data.
Citation:
– Namly Driss, Karim Bouzoubaa, Abdelhamid El Jihad, and Si Lhoussain Aouragh. “Improving Arabic Lemmatization Through a Lemmas Database and a Machine-Learning Technique.” In Recent Advances in NLP: The Case of Arabic Language, pp. 81-100. Springer, Cham, 2020.
This corpus was originally created for performance testing (server infrastructure CorpusExplorer - see: diskurslinguistik.net / diskursmonitor.de). It includes the filtered database (German texts only) of CommonCrawl (as of March 2018). First, the URLs were filtered according to their top-level domain (de, at, ch). Then the texts were classified using NTextCat and only uniquely German texts were included in the corpus. The texts were then annotated using TreeTagger (token, lemma, part-of-speech). 2.58 million documents - 232.87 million sentences - 3.021 billion tokens. You can use CorpusExplorer (http://hdl.handle.net/11234/1-2634) to convert this data into various other corpus formats (XML, JSON, Weblicht, TXM and many more).
V úterý 29. dubna 2015 se v reprezentačních prostorách Akademie věd České republiky, v pražské vile Lanna konalo slavnostní vyhlášení cen Živy za nejlepší články uplynulého ročníku, které uděluje redakční rada a redakce časopisu., The selected best contributions to Živa in 2014 and three eminent personalities of the journal were awarded special prizes., and Jana Šrotová, Andrej Funk, Lucie Krouzová.
Relationship extraction models for the Czech language. Models are trained on CERED (dataset created by distant supervision on Czech Wikipedia and Wikidata) and recognize a subset of Wikidata relations (listed in CEREDx.LABELS).
We supply a demo.py that performs inference on user-defined input and requirements.txt file for pip. Adapt the demo code to use the model.
Both the dataset and the models are presented in Relationship Extraction thesis.
Článek shrnuje současné názory na vnitrodruhové genetické rozrůznění a taxonomii zmije obecné (Vipera berus). Zvláštní pozornost je věnována genetické příslušnosti populací českých zmijí., This article summarizes contemporary knowledge of the intraspecific genetic differentiation and taxonomy of the Common Adder (Vipera berus). Special attention is paid to the genetic status of the Czech adders., and Jiří Moravec, Jiří Šmíd.
Perloočky rodu Daphnia jsou intenzivně studovaní vodní bezobratlí. Proto byl pro nás velkým překvapením objev pro vědu nového druhu tohoto rodu v tůni na Kokořínsku. Druh byl pojmenován po významném českém hydrobiologovi Jaroslavu Hrbáčkovi, ale jeho jméno odkazuje i na hrbatý tvar těla. Jeho doposud známé lokality jsou pouze v Čechách a na Slovensku., Species of water fleas that belong to the genus Daphnia have been used intensively in studies as “model organisms”. The discovery of a new Daphnia species in a pool in Central Bohemia was, therefore, a surprise. All known localities of this species, which we named after the prominent Czech hydrobiologist Jaroslav Hrbáček, are restricted to Czechia and Slovakia., and Adam Petrusek, Petr Jan Juračka.
Článek přestavuje taxonomicky problematický rod chrpa z čeledi Asteraceae. Jsou diskutovány současné pohledy na vymezení a vnitřní členění rodu založené na molekulárních znacích. Dále je popsána problematika hybridizace, která je v rodu velmi častá, a její souvislost s polyploidií (druhy stejné ploidie se často kříží za vzniku rozsáhlých a variabilních hybridních rojů, zatímco mezi ploidiemi je silná reprodukční bariéra)., Centaurea is a taxonomically critical genus from the Asteraceae family. Current delimitation of the genus and its internal classification based on molecular data are discussed. Hybridization and polyploidy are common in Centaurea. The frequency of hybridization depends on ploidy levels of the taxa (homoploid taxa usually hybridize easily and form extensive and variable hybrid swarms, while heteroploid taxa are reproductively strongly isolated)., and Petr Koutecký.