Harvested from: LINDAT/CLARIAH-CZ repository / Type: toolService - LINDAT/CLARIAH-CZ Catalog Search Results

1. A simplified front-end for SemTi-Kamols morphological analyser

Publisher:: Institute of Mathematics and Computer Science, University of Latvia
Type:: toolService
Subject:: morphological analyzer
Language:: Latvian
Description:: A simplified front-end (in a form of a RESTful web service) of the SemTi-Kamols morphological analyzer. Mainly for demonstration purposes.
Rights:: Not specified

2. ABC - Language Identifier

Publisher:: Research Institute for Artificial Intelligence, Romanian Academy of Sciences
Type:: toolService
Description:: The application, developed in C#, automatically identifies the language of a text written in one of the 21 European Union languages. By using training texts in different languages (approx. 1.5Mb of text for each language), a training module counts the prefixes (the first 3 characters) and the suffixes (4 characters endings) for all the words in the texts, for each language. For every language two models are constructed, containing the weights (percentages) of prefixes and suffixes in the texts representing a language. In the prediction phase, for a new text, two models are built on the fly in a similar manner. These models are then compared with the stored models representing each language for which the application was trained. Using comparison functions, the best model is chose. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2008). RACAI's Linguistic Web Services. In Proceedings of the 6th Language Resources and Evaluation Conference - LREC 2008, Marrakech, Morocco, May 2008. ELRA - European Language Resources Association. ISBN 2-9517408-4-0. -- Dan Tufiş and Alexandru Ceauşu (2007). Diacritics Restoration in Romanian Texts. In Elena Paskaleva and Milena Slavcheva (eds.), A Common Natural Language Processing Paradigm for Balkan Languages - RANLP 2007 Workshop Proceedings, pp. 49-56, Borovets, Bulgaria, September 2007. INCOMA Ltd., Shoumen, Bulgaria. ISBN 978-954-91743-8-0. -- Dan Tufiş and Adrian Chiţu (1999). Automatic Insertion of Diacritics in Romanian Texts. In Ferenc Kiefer, Gábor Kiss, and Júlia Pajzs (eds.), Proceedings of the 5th International Workshop on Computational Lexicography (COMPLEX 1999), pp. 185-194, Pecs, Hungary, May 1999. Linguistics Institute, Hungarian Academy of Sciences.
Rights:: Not specified

3. Access rights Management System

Publisher:: Max Planck Institute for Psycholinguistics
Type:: toolService
Description:: A tool to grant and deny the access to (parts of) an IMDI-based corpus. Support for advanced settings like ACLs.
Rights:: Not specified

4. Annex - Annotation Exploration tool

Publisher:: Max Planck Institute for Psycholinguistics
Type:: toolService
Description:: tool in the MPI web-based framework for archive exploration (and enrichment)
Rights:: Not specified

5. ANNIS

Publisher:: University of Potsdam, Dept. of Linguistics and Humboldt-University Berlin, Institut für deutsche Sprache und Linguistik
Type:: toolService
Description:: ANNIS2 is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation. ANNIS, which stands for ANNotation of Information Structure, has been designed to provide access to the data of the SFB 632 - "Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts". Since information structure interacts with linguistic phenomena on many levels, ANNIS2 addresses the SFB's need to concurrently annotate, query and visualize data from such varied areas as syntax, semantics, morphology, prosody, referentiality, lexis and more. For project working with spoken language, support for audio / video annotations is also required.
Rights:: Not specified

6. Annotate

Creator:: Roček, Martin
Publisher:: Charles University, Faculty of Arts
Type:: TEXT and toolService
Subject:: manuscripts, annotation, application, TEI, and JavaScript
Language:: No linguistic content
Description:: Annotate is a web and desktop application that should simplify the process of transforming photos of manuscripts to a browsable collection. It also allows users to annotate parts of the displayed images.
Rights:: GNU Library or "Lesser" General Public License 3.0 (LGPL-3.0), http://opensource.org/licenses/LGPL-3.0, and PUB

7. Anotatornia

Publisher:: Institute of Computer Science, Polish Academy of Sciences
Type:: toolService
Description:: Tool for manual on-line annotation of corpora at various linguistic levels. The levels currently implemented are: word-level and sentence-level segmentation, morphosyntax, word sense disambiguation. Anotatornia implements sophisticated mechanisms of the management of texts, annotators and conflicts.
Rights:: Not specified

8. Apertium Old Catalan morphological analyzer

Publisher:: Universidad de Alicante
Type:: toolService
Subject:: morphological analyzer
Language:: Catalan
Description:: A RESTful morphological analyzer for Old Catalan.
Rights:: Not specified

9. Araucaria

Publisher:: School of Computing, University of Dundee
Type:: toolService
Subject:: argument analyzer
Description:: Araucaria is a software tool for analysing arguments. It aids a user in reconstructing and diagramming an argument using a simple point-and-click interface. The software also supports argumentation schemes, and provides a user-customisable set of schemes with which to analyse arguments. Written in Java, released under the GNU General Public License.
Rights:: Not specified

10. Assigning lemmas and part-of-speech to wordform lists

Type:: toolService
Language:: Slovenian
Description:: online service
Rights:: Not specified

11. Atlas of Place Names

Publisher:: The Research Institute for the Languages of Finland
Type:: toolService
Language:: Finnish
Description:: The digital atlas illustrates the distribution of 234 common Finnish place-name elements based on data in the Names Archive.
Rights:: Not specified

12. Bibliografie zur deutschen Grammatik (BDG)

Publisher:: Institut für Deutsche Sprache
Type:: toolService
Language:: German
Description:: Online Bibliography, bibliographic database
Rights:: Not specified

13. BitPar

Creator:: Schmid, Helmut
Publisher:: University of Stuttgart
Type:: toolService
Subject:: parser
Description:: Statistical parser
Rights:: Not specified

14. BNF Converter

Publisher:: Språkbanken, Dept. of Swedish Language, Göteborg University
Type:: toolService
Subject:: compiler construction and grammar
Description:: The BNF Converter is a compiler construction tool generating a compiler front-end from a Labelled BNF grammar.
Rights:: Not specified

15. BulTreeBank Morphological Analyzer

Creator:: Simov, Kiril and Osenova, Petya
Publisher:: Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:: toolService
Description:: It is used morphological lexicon of Bulgarian (100 000 lemmas) compiled as a finite-state automaton in CLaRK System. It requires the text to be first tokenized and it is applied in each token. Includes also guessers for unknown words and Named Entities gazetteers. If the corresponding resources are available for a different language, then it can be tuned to it.
Rights:: Not specified

16. BulTreeBank Morphosyntactic Disambiguator

Creator:: Simov, Kiril, Osenova, Petya, and Simov, Alex
Publisher:: Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:: toolService
Description:: This is a hybrid system: rules, neural network, rules. First rules for the sure cases are applied, then a neural network disambiguator is applied, then rules for repairing of the most frequent errors of the neural network. The rules are implemented as constraints in CLaRK System. The neural network is additional module implemented in Java. It is called CLaRK. It requires the morphologically annotated input.
Rights:: Not specified

17. BulTreeBank Tokenizer

Creator:: Simov, Kiril
Publisher:: Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:: toolService
Description:: The tokenizer is covering all languages that use Latin1, Laitn2, Latin3 and Cyrillic tables of Unicode. Can be extended to cover other tables in Unicode if necessary. The implementation is as a cascaded regular grammar in CLaRK. It recognizes over 60 token categories. It is easy to be adapted to new token categories.
Rights:: Not specified

18. BUSCANEO

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Language:: Catalan and Spanish
Description:: Tool for neologism extraction.
Rights:: Not specified

19. Bústia Neològica Escolar

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Language:: Catalan and Spanish
Description:: Terminology management
Rights:: Not specified

20. Bwananet

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Language:: Catalan, English, and Spanish
Description:: Tool for querying the Technical Corpus of the Institut Universitari de Lingüística Aplicada.
Rights:: Not specified

21. calcular_p_cue_class

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Statistical analysis service: It calculates P(cue|class): probability of seeing a linguistic cue given a lexical class. This probability is computed given the occurrences of cues in a corpus (codified in the signatures file) and the information of belonging or not belonging of these words to different classes (codified in indicators file). The probability is computed for each studied cue in the signatures file and for each class in the indicators file.
Rights:: Not specified

22. Catalan Annotated Corpora CQP

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: This RESTful service allows to define a sub-corpus from different annotated corpora. The service includes a POS tag harmonisation process where original tags are converted to EAGLES/Parole format. The eventual sub-corpus is indexed using the IMS CWB tool. The user receives an ID which can be used by the CQP service to exploit the sub-corpus.
Rights:: Not specified

23. Catalan Digital Press

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: This RESTful service accesses part of the Hemeroteca Digital de l’Arxiu Municipal de Girona (digital press archive from the Girona city council), specifically Catalan press from 2003. The service uses the SRU protocol.
Rights:: Not specified

24. catdoc

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Format conversion service: Word .doc to .txt converter
Rights:: Not specified

25. Cercador NEOROM

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Language:: Catalan and Spanish
Description:: Search engine for the neologisms database of the NEOROM network. The network collects neologisms used in the press written in Romance languages from 2005 onwards.
Rights:: Not specified

26. Cercador OBNEO

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Search engine of the BOBNEO data bank, a database of neologisms present in the mass media in Spanish and Catalan, written and oral, from 1992.
Rights:: Not specified

27. Česílko

Creator:: Hajič, Jan, Kuboň, Vladislav, and Homola, Petr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService
Subject:: machine translation and Czech-Slovak translation
Language:: Czech
Description:: Česílko is a tool enabling the fast and efficient translation from one source language into many target languages, which are mutually related.
Rights:: Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0), http://creativecommons.org/licenses/by-nc-nd/3.0/, and PUB

28. Česílko 2.0 Shallow Transfer RBMT framework (opensource version)

Creator:: Vičič, Jernej, Kuboň, Vladislav, and Homola, Petr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: Shallow Parse, Shallow Transfer Rule-Based Machine Translation, stochastic ranker, related languages, and toolbox
Description:: The system Česílko (language data and software tools) was first developed as an answer to a growing need of translation and localisation from one source language to many target languages. The starting system belonged to the Shallow Parse, Shallow Transfer Rule-Based Machine Translation – (RBMT) paradigm and it was designed primarily for translation of related languages. The latest implementation of the system uses a stochastic ranker; so technically it belongs to the hybrid machine translation paradigm, using stochastic methods combined with the traditional Shallow Transfer RBMT methods. The system has been stripped of the accompanying language resources due to copyright restrictions. The data that is available is just for demonstrative purposes.
Rights:: Not specified

29. Cesilko Web Service for Weblicht

Creator:: Hajič, Jan, Kuboň, Vladislav, and Homola, Petr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and service
Subject:: machine translation
Description:: Weblicht integration of Cesilko (http://hdl.handle.net/11858/00-097C-0000-0006-AAFE-A)
Rights:: Not specified

30. Chared

Creator:: Pomikálek, Jan
Publisher:: Masaryk University, NLP Centre
Type:: toolService and tool
Subject:: character encoding, character encoding detection, charset, and unicode
Language:: English
Description:: Chared is a software tool which can detect character encoding of a text document provided the language of the document is known. The language of the text has to be specified as an input parameter so that the corresponding language model can be used. The package contains models for a wide range of languages (currently 57 --- covering all major languages). Furthermore, it provides a training script to learn models for additional languages using a set of user supplied sample html pages in the given language. The detection algorithm is based on determining similarity of byte trigrams vectors. In general, chared should be more accurate than other character encoding detection tools with no language constraints. This is an important advantage allowing precise character decoding needed for building large textual corpora. The tool has been used for building corpora in American Spanish, Arabic, Czech, French, Japanese, Russian, Tajik, and six Turkic languages consisting of 70 billions tokens altogether. Chared is an open source software, licensed under New BSD License and available for download (including the source code) at http://code.google.com/p/chared/. The research leading to this piece of software was published in POMIKÁLEK, Jan a Vít SUCHOMEL. chared: Character Encoding Detection with a Known Language. In Aleš Horák, Pavel Rychlý. RASLAN 2011. 5. vyd. Brno, Czech Republic: Tribun EU, 2011. od s. 125-129, 5 s. ISBN 978-80-263-0077-9. and PRESEMT, Lexical Computing Ltd
Rights:: BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB

31. CLaRK System - an XML-based system for Corpora Development

Publisher:: Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:: toolService
Subject:: corpus development
Description:: The CLaRK System incorporates several technologies: - XML technology - Unicode - Cascaded Regular Grammars; - Constraints over XML Documents On the basis of these technologies the following tools are implemented: XML Editor, Unicode Tokeniser, Sorting tool, Removing and Extracting tool, Concordancer, XSLT tool, Cascaded Regular Grammar tool, etc. 1 Unicode tokenization In order to provide possibility for imposing constraints over the textual node and to segment them in meaningful way, the CLaRK System supports a user-defined hierarchy of tokenisers. At the very basic level the user can define a tokeniser in terms of a set of token types. In this basic tokeniser each token type is defined by a set of UNICODE symbols. Above this basic level tokenisers, the user can define other tokenisers, for which the token types are defined as regular expressions over the tokens of some other tokeniser, the so called parent tokeniser. 2 Regular Grammars The regular grammars are the basic mechanism for linguistic processing of the content of an XML document within the system. The regular grammar processor applies a set of rules over the content of some elements in the document and incorporates the categories of the rules back in the document as XML mark-up. The content is processed before the application of the grammar rules in the following way: textual nodes are tokenized with respect to some appropriate tokeniser, the element nodes are textualized on the basis of XPath expressions that determine the important information about the element. The recognized word is substituted by a new XML mark-up, which can or can not contain the word. 3 Constraints The constraints that we implemented in the CLaRK System are generally based on the XPath language. We use XPath expressions to determine some data within one or several XML documents and thus we evaluate some predicates over the data. There are two modes of using a constraint. In the first mode the constraint is used for validity check, similar to the validity check, which is based on DTD or XML schema. In the second mode, the constraint is used to support the change of the document in order it to satisfy the constraint. There are three types of constraints, implemented in the system: regular expression constraints, number restriction constraints, value restriction constraints. 4 Macro Language In the CLaRK System the tools support a mechanism for describing their settings. On the basis of these descriptions (called queries) a tool can be applied only by pointing to a certain description record. Each query contains the states of all settings and options which the corresponding tool has. Once having this kind of queries there is a special tool for combining and applying them in groups (macros). During application the queries are executed successively and the result from an application is an input for the next one. For a better control on the process of applying several queries in one we introduce several conditional operators. These operators can determine the next query for application depending on certain conditions. When a condition for such an operator is satisfied, the execution continues from a location defined in the operator. The mechanism for addressing queries is based on user defined labels. When a condition is not satisfied the operator is ignored and the process continues from the position following the operator. In this way constructions like IF-THEN-ELSE and WHILE-DO easily can be expressed. The system supports five types of control operators: IF (XPath): the condition is an XPath expression which is evaluated on the current working document. If the result is a non-empty node-set, non-empty string, positive number or true boolean value the condition is satisfied; IF NOT (XPath): the same kind of condition as the previous one but the approving result is negated; IF CHANGED: the condition is satisfied if the preceding operation has changed the current working document or has produced a non-empty result document (depending on the operation); IF NOT CHANGED: the condition is satisfied if either the previous operation did not change the working document or did not produce a non-empty result. GOTO: unconditional changing the execution position. Each macro defined in the system can have its own query and can be incorporated in another macro. In this way some limited form of subroutine can be implemented. The new version of CLaRK will support server applications, calls to/from external programs.
Rights:: Not specified

32. CLaRK System - XML-based system for Corpora Development

Creator:: Simov, Kiril, Simov, Alex, and Kouylekov, Milen
Publisher:: Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:: toolService
Description:: The CLaRK System incorporates several technologies: - XML technology - Unicode - Cascaded Regular Grammars; - Constraints over XML Documents On the basis of these technologies the following tools are implemented: XML Editor, Unicode Tokeniser, Sorting tool, Removing and Extracting tool, Concordancer, XSLT tool, Cascaded Regular Grammar tool, etc. 1 Unicode tokenization In order to provide possibility for imposing constraints over the textual node and to segment them in meaningful way, the CLaRK System supports a user-defined hierarchy of tokenisers. At the very basic level the user can define a tokeniser in terms of a set of token types. In this basic tokeniser each token type is defined by a set of UNICODE symbols. Above this basic level tokenisers, the user can define other tokenisers, for which the token types are defined as regular expressions over the tokens of some other tokeniser, the so called parent tokeniser. 2 Regular Grammars The regular grammars are the basic mechanism for linguistic processing of the content of an XML document within the system. The regular grammar processor applies a set of rules over the content of some elements in the document and incorporates the categories of the rules back in the document as XML mark-up. The content is processed before the application of the grammar rules in the following way: textual nodes are tokenized with respect to some appropriate tokeniser, the element nodes are textualized on the basis of XPath expressions that determine the important information about the element. The recognized word is substituted by a new XML mark-up, which can or can not contain the word. 3 Constraints The constraints that we implemented in the CLaRK System are generally based on the XPath language. We use XPath expressions to determine some data within one or several XML documents and thus we evaluate some predicates over the data. There are two modes of using a constraint. In the first mode the constraint is used for validity check, similar to the validity check, which is based on DTD or XML schema. In the second mode, the constraint is used to support the change of the document in order it to satisfy the constraint. There are three types of constraints, implemented in the system: regular expression constraints, number restriction constraints, value restriction constraints. 4 Macro Language In the CLaRK System the tools support a mechanism for describing their settings. On the basis of these descriptions (called queries) a tool can be applied only by pointing to a certain description record. Each query contains the states of all settings and options which the corresponding tool has. Once having this kind of queries there is a special tool for combining and applying them in groups (macros). During application the queries are executed successively and the result from an application is an input for the next one. For a better control on the process of applying several queries in one we introduce several conditional operators. These operators can determine the next query for application depending on certain conditions. When a condition for such an operator is satisfied, the execution continues from a location defined in the operator. The mechanism for addressing queries is based on user defined labels. When a condition is not satisfied the operator is ignored and the process continues from the position following the operator. In this way constructions like IF-THEN-ELSE and WHILE-DO easily can be expressed. The system supports five types of control operators: IF (XPath): the condition is an XPath expression which is evaluated on the current working document. If the result is a non-empty node-set, non-empty string, positive number or true boolean value the condition is satisfied; IF NOT (XPath): the same kind of condition as the previous one but the approving result is negated; IF CHANGED: the condition is satisfied if the preceding operation has changed the current working document or has produced a non-empty result document (depending on the operation); IF NOT CHANGED: the condition is satisfied if either the previous operation did not change the working document or did not produce a non-empty result. GOTO: unconditional changing the execution position. Each macro defined in the system can have its own query and can be incorporated in another macro. In this way some limited form of subroutine can be implemented. The new version of CLaRK will support server applications, calls to/from external programs.
Rights:: Not specified

33. COLDIC

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Tool for dictionary management
Rights:: Not specified

34. CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: coreference resolution, CorPipe, and CorefUD
Language:: Catalan, Czech, German, English, Spanish, French, Hungarian, Lithuanian, Norwegian Bokmål, Norwegian Nynorsk, Polish, Russian, and Turkish
Description:: The `corpipe23-corefud1.1-231206` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 23 (https://github.com/ufal/crac2023-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no _corpus id_ on input), so it can be used to predict coreference in any `mT5` language (for zero-shot evaluation, see the paper). However, note that the empty nodes must be present already on input, they are not predicted (the same settings as in the CRAC23 shared task).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

35. Corpus query for Estonian corpora

Publisher:: University of Tartu
Type:: toolService
Language:: Estonian
Description:: Web application for querying the automatically morphologically disambiguated Mixed corpus of Estonian
Rights:: Not specified

36. Corpus Work Bench CWB (CQP)

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: This SOAP service implements the IMS Open Corpus Workbench (CWB), a collection of open-source tools for managing and querying large text corpora (ranging from 10 million to 2 billion words) with linguistic annotations. Its central component is the flexible and efficient query processor CQP. The service makes it possible to index a new corpus and query it.
Rights:: Not specified

37. CorpusExplorer

Creator:: Rüdiger, Jan Oliver
Publisher:: Jan Oliver Rüdiger
Type:: tool and toolService
Subject:: Corpus Linguisitics, NLP, conll, tei, XML, nlp, Natural Language Processing, linguistics, Linguistics, Computational Linguistics, corpus processing, tagger, POS tagger, lemmatization, text cleaning, CommonCrawl, epub, JSON, Twitter, Pandoc, Wikipedia, digital data, DTA, DSpin, MySQL, ElasticSearch, TextGrid, text corpora, TigerXML, and WeblichtXML
Language:: German, English, French, Italian, Dutch, Spanish, Polish, Arabic, Chinese, and Portuguese
Description:: Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks such as text acquisition, cleaning or tagging are completely automated. The simple interface supports the use in university teaching and leads users/students to fast and substantial results. The CorpusExplorer is open for many standards (XML, CSV, JSON, R, etc.) and also offers its own software development kit (SDK). Source code available at https://github.com/notesjor/corpusexplorer2.0
Rights:: Not specified

38. Croatian Lemmatization Server

Publisher:: University of Zagreb, Faculty of Humanities and Social Sciences
Type:: toolService
Language:: Croatian
Description:: On line service for lemmatization, full POS or MSD tagging of Croatian texts.
Rights:: Not specified

39. CST's lemmatiser

Publisher:: Center for Sprogteknologi, University of Copenhagen
Type:: toolService
Language:: Danish, Dutch, English, German, Modern Greek (1453-), Icelandic, Norwegian, Russian, Slovenian, and Swedish
Description:: 1) Fully automatic rule based lemmatization of inflected languages 2) Fully automatic training of lemmatization rules based on full form-lemma list
Rights:: Not specified

40. CST's lemmatizer

Creator:: Jongejan, Bart
Publisher:: Københavns Universitet, Center for Sprogteknologi (CST)
Type:: toolService
Description:: 1) Fully automatic rule based lemmatization of inflected languages 2) Fully automatic training of lemmatization rules based on full form-lemma list
Rights:: Not specified

41. CUBBITT Translation Models (en-cs) (v1.0)

Creator:: Popel, Martin, Tomková, Markéta, Tomek, Jakub, Kaiser, Łukasz, Uszkoreit, Jakob, Bojar, Ondřej, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: machine translation, neural machine translation, transformer, and cubbitt
Language:: English and Czech
Description:: CUBBITT En-Cs translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2014 (BLEU): en->cs: 27.6 cs->en: 34.4 (Evaluated using multeval: https://github.com/jhclark/multeval)
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

42. CUBBITT Translation Models (en-fr) (v1.0)

Creator:: Popel, Martin, Tomková, Markéta, Tomek, Jakub, Kaiser, Łukasz, Uszkoreit, Jakob, Bojar, Ondřej, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: machine translation, neural machine translation, transformer, and cubbitt
Language:: English and French
Description:: CUBBITT En-Fr translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2014 (BLEU): en->fr: 38.2 fr->en: 36.7 (Evaluated using multeval: https://github.com/jhclark/multeval)
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

43. CUBBITT Translation Models (en-pl) (v1.0)

Creator:: Popel, Martin, Tomková, Markéta, Tomek, Jakub, Kaiser, Łukasz, Uszkoreit, Jakob, Bojar, Ondřej, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: machine translation, neural machine translation, transformer, and cubbitt
Language:: English and Polish
Description:: CUBBITT En-Pl translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2020 (BLEU): en->pl: 12.3 pl->en: 20.0 (Evaluated using multeval: https://github.com/jhclark/multeval)
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

44. Cyril Belica : Kookkurrenzdatenbank CCDB

Publisher:: Institut für Deutsche Sprache
Type:: toolService
Language:: German
Description:: A co-occurrence database, developed by the Institut fuer Deutsche Sprache, for research in the field of collocation analysis in modern German. The database holds over 200,000 analysed words that can be browsed or searched and shown in context.
Rights:: Not specified

45. Czech image captioning, machine translation, and sentiment analysis (Neural Monkey models)

Creator:: Libovický, Jindřich, Rosa, Rudolf, Helcl, Jindřich, and Popel, Martin
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: suiteOfTools and toolService
Subject:: sentiment analysis, machine translation, image captioning, neural networks, transformer, and Neural Monkey
Language:: Czech and English
Description:: This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving three NLP tasks: machine translation, image captioning, and sentiment analysis. The models are trained on standard datasets and achieve state-of-the-art or near state-of-the-art performance in the tasks. The models are described in the accompanying paper. The same models can also be invoked via the online demo: https://ufal.mff.cuni.cz/grants/lsd There are several separate ZIP archives here, each containing one model solving one of the tasks for one language. To use a model, you first need to install Neural Monkey: https://github.com/ufal/neuralmonkey To ensure correct functioning of the model, please use the exact version of Neural Monkey specified by the commit hash stored in the 'git_commit' file in the model directory. Each model directory contains a 'run.ini' Neural Monkey configuration file, to be used to run the model. See the Neural Monkey documentation to learn how to do that (you may need to update some paths to correspond to your filesystem organization). The 'experiment.ini' file, which was used to train the model, is also included. Then there are files containing the model itself, files containing the input and output vocabularies, etc. For the sentiment analyzers, you should tokenize your input data using the Moses tokenizer: https://pypi.org/project/mosestokenizer/ For the machine translation, you do not need to tokenize the data, as this is done by the model. For image captioning, you need to: - download a trained ResNet: http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz - clone the git repository with TensorFlow models: https://github.com/tensorflow/models - preprocess the input images with the Neural Monkey 'scripts/imagenet_features.py' script (https://github.com/ufal/neuralmonkey/blob/master/scripts/imagenet_features.py) -- you need to specify the path to ResNet and to the TensorFlow models to this script Feel free to contact the authors of this submission in case you run into problems!
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

46. Czech image captioning, machine translation, sentiment analysis and summarization (Neural Monkey models)

Creator:: Libovický, Jindřich, Rosa, Rudolf, Helcl, Jindřich, and Popel, Martin
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: suiteOfTools and toolService
Subject:: sentiment analysis, machine translation, image captioning, neural networks, transformer, Neural Monkey, and summarization
Language:: Czech and English
Description:: This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving four NLP tasks: machine translation, image captioning, sentiment analysis, and summarization. The models are trained on standard datasets and achieve state-of-the-art or near state-of-the-art performance in the tasks. The models are described in the accompanying paper. The same models can also be invoked via the online demo: https://ufal.mff.cuni.cz/grants/lsd In addition to the models presented in the referenced paper (developed and published in 2018), we include models for automatic news summarization for Czech and English developed in 2019. The Czech models were trained using the SumeCzech dataset (https://www.aclweb.org/anthology/L18-1551.pdf), the English models were trained using the CNN-Daily Mail corpus (https://arxiv.org/pdf/1704.04368.pdf) using the standard recurrent sequence-to-sequence architecture. There are several separate ZIP archives here, each containing one model solving one of the tasks for one language. To use a model, you first need to install Neural Monkey: https://github.com/ufal/neuralmonkey To ensure correct functioning of the model, please use the exact version of Neural Monkey specified by the commit hash stored in the 'git_commit' file in the model directory. Each model directory contains a 'run.ini' Neural Monkey configuration file, to be used to run the model. See the Neural Monkey documentation to learn how to do that (you may need to update some paths to correspond to your filesystem organization). The 'experiment.ini' file, which was used to train the model, is also included. Then there are files containing the model itself, files containing the input and output vocabularies, etc. For the sentiment analyzers, you should tokenize your input data using the Moses tokenizer: https://pypi.org/project/mosestokenizer/ For the machine translation, you do not need to tokenize the data, as this is done by the model. For image captioning, you need to: - download a trained ResNet: http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz - clone the git repository with TensorFlow models: https://github.com/tensorflow/models - preprocess the input images with the Neural Monkey 'scripts/imagenet_features.py' script (https://github.com/ufal/neuralmonkey/blob/master/scripts/imagenet_features.py) -- you need to specify the path to ResNet and to the TensorFlow models to this script The summarization models require input that is tokenized with Moses Tokenizer (https://github.com/alvations/sacremoses) and lower-cased. Feel free to contact the authors of this submission in case you run into problems!
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

47. Czech Morphological Analyzer v1

Creator:: Hajič, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and service
Subject:: morphological analysis and lemmatization
Language:: Czech
Description:: One of the very first steps in automatic processing of Czech text is morphological analysis and lemmatization.
Rights:: Not specified

48. Czech PDT-C 1.0 Model for UDPipe 2 (2023-11-16)

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: tokenizer, POS tagger, lemmatization, parser, dependency parser, MorfFlex CZ 2.0, and PDT-C 1.0
Language:: Czech
Description:: Tokenizer, POS Tagger, Lemmatizer, and Parser model based on the PDT-C 1.0 treebank (https://hdl.handle.net/11234/1-3185). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#czech_pdtc1.0_model . To use these models, you need UDPipe version 2.1, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

49. Dendrarium

Publisher:: Institute of Computer Science, Polish Academy of Sciences
Type:: toolService
Description:: Coordinates work of a group of linguists selecting appropriate parse trees from many generated ones. Assigns parts of the task, signalling differences in annotation and allowing them to be corrected by a supervisor.
Rights:: Not specified

50. Depfix: Automatic Post-editing of SMT

Creator:: Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: machine translation, post-editing, Treex, morphology, and parsing
Language:: English and Czech
Description:: Depfix, a tool for Automatic Post-editing of SMT. See the project website for more information.
Rights:: GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB

51. Dialogy.Org

Creator:: Peterek, Nino
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and service
Subject:: multimedia corpora search service
Description:: The Dialogy.Org system allows users to search in transcribed audio-visual corpora. The Dialogy.Org works on the principle of web-based interface, so installation of additional programs on your computer is not necessary. You must have Flash Player for playing audio or video recordings. and This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
Rights:: Not specified

52. Digital archive of Finnish Folk Tunes

Publisher:: Department of Music, University of Jyväskylä
Type:: toolService
Language:: Finnish
Description:: Digitalized versions of Finnish folk tunes and their relevant details (notation, key, meter, place of collection, lyrics, collector), 8613 Finnish folk tunes (including part of the lyrics)
Rights:: Not specified

53. DiSi: Flexible Dialogue System

Publisher:: Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)
Type:: toolService
Language:: Spanish
Description:: Dialogue manager
Rights:: Not specified

54. Dspace modifications for use of EPIC handles

Creator:: Pajas, Petr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService
Subject:: DSpace, handle, and EPIC
Description:: Modifications to DSpace made by Petr Pajas in order to support pidconsortium.eu PID handle system instead of the default handle.com system used by DSpace.
Rights:: BSD 2-Clause "Simplified" or "FreeBSD" license, http://opensource.org/licenses/BSD-2-Clause, and PUB

55. DTAG dependency treebank tool

Publisher:: Copenhagen Business School
Type:: toolService
Description:: DTAG is a versatile annotation tool that supports manual and semi-automatic annotation of a wide range of linguistic phenomena, including the annotation of syntax, discourse, coreference, morphology, and word and phrase alignments. It includes commands for editing general labeled graphs and graph alignments, comparing annotations, managing annotation tasks, and interfacing with a revision control system. Its visualization component can display graphs and alignments for entire texts in a compact format, with a highly flexible and configurable formatting scheme. It also provides a powerful search-replace mechanism with queries based on full first-order logic, which can be used to search for linguistic constructions and automatically apply graph transformations to collections of annotated graphs. The visualization component does not currently support characters outside the ISO-latin character set.
Rights:: Not specified

56. DZ Interset

Creator:: Zeman, Daniel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and tool
Subject:: morphology, NLP, and Perl
Description:: DZ Interset is a means of converting among various tag sets in natural language processing. The core idea is similar to interlingua-based machine translation. DZ Interset defines a set of features that are encoded by the various tag sets. The set of features should be as universal as possible. It does not need to encode everything that is encoded by any tag set but it should encode all information that people may want to access and/or port from one tag set to another. New tag sets are attached by writing a driver for them. Once the driver is ready, you can easily convert tags between the new set and any other set for which you also have a driver. This reusability is an obvious advantage over writing a targeted conversion procedure each time you need to convert between a particular pair of tag sets. and grant MSM 0021620838 of the Ministry of Education of the Czech Republic
Rights:: GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB

57. EFCL Channelizer

Creator:: Klusáček, David
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: Fast Channelizer, Filterbank, ASR Front End, Software Defined Radio, Polyphase Filter, Frequency Multiplexing, Audio Denoising, High Performance Computing, HPC, SDR, FFT, FFTW, SIMD, AVX, SSE, and NEON
Description:: Extremely fast digital audio channelizer implementation, usable as a building block for experimental ASR front-ends or signal denoising applications. Also applicable in software defined radios, due to its high throughput. It comes in a form of a C/C++ library and an executable example program which reads input stream, splitting it into equidistant frequency channels, emitting their data to the output. Features: (1) Hand tuned SIMD-aware assembly for x86 (SSE) and IA64 (AVX) as well as for ARM (NEON) processors. (2) Generic non-SIMD C++ implementation for other architectures. (3) Capable of taking advantage of multicore CPUs. (4) Fully configurable number of channels and the output decimation rate. (5) User supplied FIR of the channel separation filter, which allows to specify the width of the channels, whether they should overlap or be separated. (6) Input and output signal samples are treated as complex numbers. (7) Speed over 750 complex MS/s achieved on Core i7 4710HQ @ 2.5GHz, when channelizing into 72 output channels with a FIR length of 1152 samples, using 3 computing threads. (8) Runs under Linux OS.
Rights:: Mozilla Public License 2.0, http://opensource.org/licenses/MPL-2.0, and PUB

58. ELAN

Publisher:: Max Planck Institute for Psycholinguistics
Type:: toolService
Description:: Multimodal annotation tool
Rights:: Not specified

59. ElixirFM

Creator:: Smrž, Otakar, Bielický, Viktor, and Buckwalter, Tim
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService
Subject:: Arabic morphology and ElixirFM
Language:: Arabic
Description:: ElixirFM is a high-level implementation of Functional Arabic Morphology documented at http://elixir-fm.wiki.sourceforge.net/. The core of ElixirFM is written in Haskell, while interfaces in Perl support lexicon editing and other interactions.
Rights:: http://opensource.org/licenses/GPL-3.0

60. Ellogon

Type:: toolService
Description:: Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. Ellogon as a language engineering platform offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.
Rights:: Not specified

61. EMU Speech Database System

Publisher:: Institute of Phonetics and Speech Processing, LMU Munich
Type:: toolService
Description:: EMU is a collection of software tools for the creation, manipulation and analysis of speech databases. At the core of EMU is a database search engine which allows the researcher to find various speech segments based on the sequential and hierarchical structure of the utterances in which they occur. EMU includes an interactive labeller which can display spectrograms and other speech waveforms, and which allows the creation of hierarchical, as well as sequential, labels for a speech utterance.
Rights:: Not specified

62. English-Latvian SMT system

Publisher:: Institute of Mathematics and Computer Science, University of Latvia
Type:: toolService
Language:: English
Description:: English-Latvian factored SMT system uses Moses decoder, trained on JRC-Acquis and some other parallel texts
Rights:: Not specified

63. English-Lithuanian Machine Translation Service

Publisher:: Center of Computational Linguistics, Vytautas Magnus University
Type:: toolService
Language:: English and Lithuanian
Description:: On-line freely accessible machine translation tool for translating English webpages or texts into Lithuanian.
Rights:: Not specified

68. EvaLatin 2020 models for UDPipe 2 (2020-08-31)

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: POS tagger, lemmatization, and tagger
Language:: Latin
Description:: POS Tagger and Lemmatizer models for EvaLatin2020 data (https://github.com/CIRCSE/LT4HALA). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#evalatin20_models . To use these models, you need UDPipe version at least 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

69. EVALD 1.0

Creator:: Rysová, Kateřina, Mírovský, Jiří, Novák, Michal, and Rysová, Magdaléna
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: text coherence, discourse, automatic evaluation, and native speakers
Language:: Czech
Description:: EVALD 1.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.
Rights:: BSD 2-Clause "Simplified" or "FreeBSD" license, http://opensource.org/licenses/BSD-2-Clause, and PUB

70. EVALD 1.0 for Foreigners

Creator:: Rysová, Kateřina, Mírovský, Jiří, Novák, Michal, and Rysová, Magdaléna
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: text coherence, discourse, automatic evaluation, and non-native speakers
Language:: Czech
Description:: EVALD 1.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.
Rights:: BSD 2-Clause "Simplified" or "FreeBSD" license, http://opensource.org/licenses/BSD-2-Clause, and PUB

71. EVALD 2.0

Creator:: Novák, Michal, Rysová, Kateřina, Mírovský, Jiří, Rysová, Magdaléna, and Hajičová, Eva
Publisher:: Charles University, UFAL
Type:: tool and toolService
Subject:: text coherence, discourse, automatic evaluation, and native speakers
Language:: Czech
Description:: EVALD 2.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

72. EVALD 2.0 for Foreigners

Creator:: Novák, Michal, Rysová, Kateřina, Mírovský, Jiří, Rysová, Magdaléna, and Hajičová, Eva
Publisher:: Charles University, UFAL
Type:: tool and toolService
Subject:: text coherence, discourse, automatic evaluation, and non-native speakers
Language:: Czech
Description:: EVALD 2.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

73. EVALD 3.0 – Evaluator of Discourse

Creator:: Mírovský, Jiří, Novák, Michal, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
Publisher:: Charles University, UFAL
Type:: tool and toolService
Subject:: text coherence, discourse, automatic evaluation, and native speakers
Language:: Czech
Description:: EVALD 3.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

74. EVALD 3.0 for Foreigners – Evaluator of Discourse

Creator:: Mírovský, Jiří, Novák, Michal, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
Publisher:: Charles University, UFAL
Type:: tool and toolService
Subject:: text coherence, discourse, automatic evaluation, and non-native speakers
Language:: Czech
Description:: EVALD 3.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

75. EVALD 4.0 – Evaluator of Discourse

Creator:: Novák, Michal, Mírovský, Jiří, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: text coherence, discourse, automatic evaluation, and non-native speakers
Language:: Czech
Description:: EVALD 4.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

76. EVALD 4.0 for Beginners – Evaluator of Discourse

Creator:: Novák, Michal, Mírovský, Jiří, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: text coherence, discourse, automatic evaluation, and non-native speakers
Language:: Czech
Description:: EVALD 4.0 for Beginners is a software that serves for automatic evaluation of Czech texts written by non-native speakers of Czech – language beginners.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

77. EVALD 4.0 for Foreigners – Evaluator of Discourse

Creator:: Novák, Michal, Mírovský, Jiří, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: text coherence, discourse, automatic evaluation, and non-native speakers
Language:: Czech
Description:: EVALD 4.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

78. eXist

Publisher:: Språkbanken, Dept. of Swedish Language, Göteborg University
Type:: toolService
Description:: eXist-db is an open source database management system entirely built on XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing.
Rights:: Not specified

79. Extract

Creator:: Forsberg, Markus and Ranta, Aarne
Publisher:: Språkbanken, Dept. of Swedish Language, Göteborg University
Type:: toolService
Subject:: morphology extraction
Description:: Extract is a tool for supervised morphological lexicon extraction from raw text data.
Rights:: Not specified

80. Fairytale child

Creator:: Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and tool
Subject:: dialogue system, morphological generation, Treex, morphological analysis, and interactive
Language:: English and Czech
Description:: Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible. The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible. Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo. Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104. It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
Rights:: GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB

81. Fairytale child (2014-09-26)

Creator:: Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and tool
Subject:: dialogue system, morphological generation, Treex, morphological analysis, and interactive
Language:: English and Czech
Description:: Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible. The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible. Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo. Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104. It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
Rights:: GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB

82. Fairytale child (2014-09-30)

Creator:: Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and tool
Subject:: dialogue system, morphological generation, Treex, morphological analysis, and interactive
Language:: English and Czech
Description:: Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible. The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible. Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo. Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104. It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
Rights:: GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB

83. Fairytale child (2014-11-21)

Creator:: Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and tool
Subject:: dialogue system, morphological generation, Treex, morphological analysis, and interactive
Language:: English and Czech
Description:: Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible. The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible. Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo. Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104. It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
Rights:: GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB

84. Feature-based tagger

Creator:: Hajič, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService
Subject:: morphology and tagger
Description:: The Feature-based (exponential model) Tagger is a fast implementation of the Czech tagger developed at UFAL and described in the PDT 1.0 documentation (Czech Language Tagging page). In order to get the best possible results, the tagger requires preprocessing by a Czech morphological module with a very high coverage. This module covers a superset of the Czech "FM" morphology. Both the morphological module and the tagger are supplied as binary executables, together with all necessary precompiled Czech data. Input must be in the ISO Latin 2 (iso-8859-2) code and follow the csts.dtd definition, and output is produced in the same way (ISO Latin 2 code, csts.dtd). (As is the case with many of the tools provided with PDT 1.0, both executables also accept - and then produce - a "simplified SGML", which is not a real, valid SGML, but simply contains at least the tags for words, punctuation, and sentence breaks, one item per line.)
Rights:: PDT 2.0 License, https://lindat.mff.cuni.cz/repository/xmlui/page/license-pdt2, and ACA

85. Fine-Tracker

Publisher:: Centre for Language and Speech Technology, Radboud University
Type:: toolService
Description:: Computational model of human word recognition; Fine-phonetic detail
Rights:: Not specified

86. FOLKER

Publisher:: Institut für Deutsche Sprache
Type:: toolService
Description:: Audio transcription editor used for the construction of the FOLK corpus
Rights:: Not specified

87. ForFun 1.0

Creator:: Mikulová, Marie and Bejček, Eduard
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: service and toolService
Subject:: form, function, database, and syntax
Language:: Czech
Description:: ForFun is a database of linguistic forms and their syntactic functions built with the use of the multi-layer annotated corpora of Czech, the Prague Dependency Treebanks. The purpose of the Prague Database of Forms and Functions (ForFun) is to help the linguists to study the form-function relation, which we assume to be one of the principal tasks of both theoretical linguistics and natural language processing. A prototypical question to be asked is "What purposes does a preposition 'po' serve for" or "What are the linguistic means in the sentence that can express the meaning 'a destination of an action'?". There are almost 1500 distinct forms (besides the 'po' preposition) and 65 distinct functions (besides the 'destination').
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

88. freeling

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Web service consisting of the Freeling open source language analysis tool suite.
Rights:: Not specified

89. FreeLing

Publisher:: Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)
Type:: toolService
Language:: Catalan, English, Galician, Italian, Portuguese, and Welsh
Description:: Open source language analysis tool suite: tokenizer, stemmer/lemmatizer, named entity recognizer, chunker/segmenter, morphosyntactic tagger, syntactic tagger, corpus processer, morphological tagger, semantic tagger, analyzer, Word Sense Disambiguator.
Rights:: Not specified

90. freeling_dependency

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Freeling-based dependency parser.
Rights:: Not specified

91. freeling_morpho

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Freeling-based morphological analyzer.
Rights:: Not specified

92. freeling_parsed

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Freeling-based shallow parser.
Rights:: Not specified

93. freeling_tagging

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Freeling-based part-of-speech tagger.
Rights:: Not specified

94. freeling_tokenizer

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Freeling-based text tokenizer.
Rights:: Not specified

95. Frequency list: Early Modern Finnish

Publisher:: The Research Institute for the Languages of Finland
Type:: toolService
Subject:: word frequencies
Language:: Finnish
Description:: Frequency list of the Corpus of Early Modern Finnish, 4 862 190 words
Rights:: Not specified

96. Frequency list: Old Literary Finnish

Publisher:: The Research Institute for the Languages of Finland
Type:: toolService
Language:: Finnish
Description:: Frequency list of the Corpus of Old Literary Finnish, 3 425 382 words
Rights:: Not specified

97. Functional Morphology

Creator:: Forsberg, Markus and Ranta, Aarne
Publisher:: Språkbanken, Dept. of Swedish Language, Göteborg University
Type:: toolService
Subject:: morphology
Description:: Functional Morphology is a development environment for computational morphologies.
Rights:: Not specified

98. GATE-ANNIE

Publisher:: University of Sheffield
Type:: toolService
Description:: GATE-ANNIE, developed by the GATE group at the University of Sheffield (http;//www.gate.ac.uk; Cunningham et al., 2002,) is an Information Extraction (IE) web service for English. It consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, coreference resolver and named entity recogniser. The named entity recogniser identifies and categorizes entity names (such as persons, organizations, and location names), temporal expressions (dates and times), and certain types of numerical expressions (monetary values and percentages). GATE-ANNIE returns the fully annotated document in GATE XML format. The file saved by the client contains ANNIE's output in the default AnnotationSet and the input document's HTML or XML mark-up in the "Original markups" AnnotationSet. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02).
Rights:: Not specified

99. GATE-ANNIE-RDF

Publisher:: University of Sheffield
Type:: toolService
Description:: ANNIE-RDF developed by the GATE group at the University of Sheffield (http;//www.gate.ac.uk; Cunningham et al., 2002) is an Information Extraction (IE) web service for English. It consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, coreference resolver and named entity recogniser. The named entity recogniser identifies and categorizes entity names (such as persons, organizations, and location names), temporal expressions (dates and times), and certain types of numerical expressions (monetary values and percentages). The text spans and annotations are exported into an RDF-XML ontology, in which the recognized named entities are instances according to the PROTON ontology (http://proton.semanticweb.org/). H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02).
Rights:: Not specified

100. Gesprächanalytisches Informationssystem (GAIS)

Publisher:: Institut für Deutsche Sprache
Type:: toolService
Language:: German
Description:: web-based information system on scientific community (news, events, persons, job market, mailing list, database on research projects and corpora, bibliography, glossary and links) and recording equipment/software; disciplinary scope: research on conversation and discourse analysis and spoken language
Rights:: Not specified

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from