Zobrazit minimální záznam

 
dc.contributor.author Rosa, Rudolf
dc.contributor.author Zeman, Daniel
dc.contributor.author Mareček, David
dc.contributor.author Žabokrtský, Zdeněk
dc.date.accessioned 2017-03-24T16:01:36Z
dc.date.available 2017-03-24T16:01:36Z
dc.date.issued 2017-01-28
dc.identifier.uri http://hdl.handle.net/11234/1-1971
dc.description Trained models for UDPipe used to produce our final submission to the Vardial 2017 CLP shared task (https://bitbucket.org/hy-crossNLP/vardial2017). The SK model was trained on CS data, the HR model on SL data, and the SV model on a concatenation of DA and NO data. The scripts and commands used to create the models are part of separate submission (http://hdl.handle.net/11234/1-1970). The models were trained with UDPipe version 3e65d69 from 3rd Jan 2017, obtained from https://github.com/ufal/udpipe -- their functionality with newer or older versions of UDPipe is not guaranteed. We list here the Bash command sequences that can be used to reproduce our results submitted to VarDial 2017. The input files must be in CoNLLU format. The models only use the form, UPOS, and Universal Features fields (SK only uses the form). You must have UDPipe installed. The feats2FEAT.py script, which prunes the universal features, is bundled with this submission. SK -- tag and parse with the model: udpipe --tag --parse sk-translex.v2.norm.feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu A slightly better after-deadline model (sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe), which we mention in the accompanying paper, is also included. It is applied in the same way (udpipe --tag --parse sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu). HR -- prune the Features to keep only Case and parse with the model: python3 feats2FEAT.py Case < hr-ud-predPoS-test.conllu | udpipe --parse hr-translex.v2.norm.Case.w2v.trainonpred.udpipe NO -- put the UPOS annotation aside, tag Features with the model, merge with the left-aside UPOS annotation, and parse with the model (this hassle is because UDPipe cannot be told to keep UPOS and only change Features): cut -f1-4 no-ud-predPoS-test.conllu > tmp udpipe --tag no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe no-ud-predPoS-test.conllu | cut -f5- | paste tmp - | sed 's/^\t$//' | udpipe --parse no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe
dc.language.iso slk
dc.language.iso hrv
dc.language.iso nor
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation info:eu-repo/grantAgreement/EC/H2020/644402
dc.relation.isreferencedby http://web.science.mq.edu.au/~smalmasi/vardial4/pdf/VarDial26.pdf
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject parsing
dc.subject dependency parser
dc.subject cross-lingual parsing
dc.subject universal dependencies
dc.title Slavic Forest, Norwegian Wood (models)
dc.type toolService
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
metashare.ResourceInfo#ContentInfo.detailedType other
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Rudolf Rosa rosa@ufal.mff.cuni.cz Charles University, UFAL
sponsor European Union EC/H2020/644402 HimL - Health in my Language euFunds info:eu-repo/grantAgreement/EC/H2020/644402
sponsor Univerzita Karlova (mimo GAUK) SVV 260 333 Specifický vysokoškolský výzkum nationalFunds
sponsor Grantová agentura České republiky 15-10472S Morphologically and Syntactically Annotated Corpora of Many Languages nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds
sponsor Grantová agentura Univerzity Karlovy v Praze GAUK 15723/2014 Modelování závislostní syntaxe napříč jazyky nationalFunds
files.size 211839718
files.count 5


 Soubory tohoto záznamu

 Stáhnout všechny soubory záznamu (202.03 MB)
Icon
Název
feats2FEAT.py
Velikost
412 bajtů
Formát
Neznámý
Popis
Features pruning script
MD5
5089de1e63c1aa36cf284bb85600365c
 Stáhnout soubor
Icon
Název
no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe
Velikost
28.54 MB
Formát
Neznámý
Popis
Model for parsing Norwegian (and tagging Norwegian Case)
MD5
af624d0dcde21068f51da7c2a4511780
 Stáhnout soubor
Icon
Název
hr-translex.v2.norm.Case.w2v.trainonpred.udpipe
Velikost
50.82 MB
Formát
Neznámý
Popis
Model for parsing Croatian
MD5
9281c6a9cf0cf1df0e7466bc1d8ba2fa
 Stáhnout soubor
Icon
Název
sk-translex.v2.norm.feats07.w2v.trainonpred.udpipe
Velikost
58.83 MB
Formát
Neznámý
Popis
Model for tagging and parsing Slovak
MD5
1d3793c42d2a75e14074dbbef8fdc5bf
 Stáhnout soubor
Icon
Název
sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe
Velikost
63.83 MB
Formát
Neznámý
Popis
Better after-deadline model for tagging and parsing Slovak
MD5
e3b6101b345e6ffe361ac0c83ccc41fd
 Stáhnout soubor

Zobrazit minimální záznam