« Previous |
1 - 100 of 302
|
Next »
Number of results to display per page
Search Results
2. ABC - Language Identifier
- Publisher:
- Research Institute for Artificial Intelligence, Romanian Academy of Sciences
- Type:
- toolService
- Description:
- The application, developed in C#, automatically identifies the language of a text written in one of the 21 European Union languages. By using training texts in different languages (approx. 1.5Mb of text for each language), a training module counts the prefixes (the first 3 characters) and the suffixes (4 characters endings) for all the words in the texts, for each language. For every language two models are constructed, containing the weights (percentages) of prefixes and suffixes in the texts representing a language. In the prediction phase, for a new text, two models are built on the fly in a similar manner. These models are then compared with the stored models representing each language for which the application was trained. Using comparison functions, the best model is chose. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2008). RACAI's Linguistic Web Services. In Proceedings of the 6th Language Resources and Evaluation Conference - LREC 2008, Marrakech, Morocco, May 2008. ELRA - European Language Resources Association. ISBN 2-9517408-4-0. -- Dan Tufiş and Alexandru Ceauşu (2007). Diacritics Restoration in Romanian Texts. In Elena Paskaleva and Milena Slavcheva (eds.), A Common Natural Language Processing Paradigm for Balkan Languages - RANLP 2007 Workshop Proceedings, pp. 49-56, Borovets, Bulgaria, September 2007. INCOMA Ltd., Shoumen, Bulgaria. ISBN 978-954-91743-8-0. -- Dan Tufiş and Adrian Chiţu (1999). Automatic Insertion of Diacritics in Romanian Texts. In Ferenc Kiefer, Gábor Kiss, and Júlia Pajzs (eds.), Proceedings of the 5th International Workshop on Computational Lexicography (COMPLEX 1999), pp. 185-194, Pecs, Hungary, May 1999. Linguistics Institute, Hungarian Academy of Sciences.
- Rights:
- Not specified
3. Access rights Management System
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- toolService
- Description:
- A tool to grant and deny the access to (parts of) an IMDI-based corpus. Support for advanced settings like ACLs.
- Rights:
- Not specified
4. Annex - Annotation Exploration tool
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- toolService
- Description:
- tool in the MPI web-based framework for archive exploration (and enrichment)
- Rights:
- Not specified
5. ANNIS
- Publisher:
- University of Potsdam, Dept. of Linguistics and Humboldt-University Berlin, Institut für deutsche Sprache und Linguistik
- Type:
- toolService
- Description:
- ANNIS2 is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation. ANNIS, which stands for ANNotation of Information Structure, has been designed to provide access to the data of the SFB 632 - "Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts". Since information structure interacts with linguistic phenomena on many levels, ANNIS2 addresses the SFB's need to concurrently annotate, query and visualize data from such varied areas as syntax, semantics, morphology, prosody, referentiality, lexis and more. For project working with spoken language, support for audio / video annotations is also required.
- Rights:
- Not specified
6. Annotate
- Creator:
- Roček, Martin
- Publisher:
- Charles University, Faculty of Arts
- Type:
- TEXT and toolService
- Subject:
- manuscripts, annotation, application, TEI, and JavaScript
- Language:
- No linguistic content
- Description:
- Annotate is a web and desktop application that should simplify the process of transforming photos of manuscripts to a browsable collection. It also allows users to annotate parts of the displayed images.
- Rights:
- GNU Library or "Lesser" General Public License 3.0 (LGPL-3.0), http://opensource.org/licenses/LGPL-3.0, and PUB
7. Anotatornia
- Publisher:
- Institute of Computer Science, Polish Academy of Sciences
- Type:
- toolService
- Description:
- Tool for manual on-line annotation of corpora at various linguistic levels. The levels currently implemented are: word-level and sentence-level segmentation, morphosyntax, word sense disambiguation. Anotatornia implements sophisticated mechanisms of the management of texts, annotators and conflicts.
- Rights:
- Not specified
8. Apertium Old Catalan morphological analyzer
- Publisher:
- Universidad de Alicante
- Type:
- toolService
- Subject:
- morphological analyzer
- Language:
- Catalan
- Description:
- A RESTful morphological analyzer for Old Catalan.
- Rights:
- Not specified
9. Araucaria
- Publisher:
- School of Computing, University of Dundee
- Type:
- toolService
- Subject:
- argument analyzer
- Description:
- Araucaria is a software tool for analysing arguments. It aids a user in reconstructing and diagramming an argument using a simple point-and-click interface. The software also supports argumentation schemes, and provides a user-customisable set of schemes with which to analyse arguments. Written in Java, released under the GNU General Public License.
- Rights:
- Not specified
10. Assigning lemmas and part-of-speech to wordform lists
- Type:
- toolService
- Language:
- Slovenian
- Description:
- online service
- Rights:
- Not specified
11. Atlas of Place Names
- Publisher:
- The Research Institute for the Languages of Finland
- Type:
- toolService
- Language:
- Finnish
- Description:
- The digital atlas illustrates the distribution of 234 common Finnish place-name elements based on data in the Names Archive.
- Rights:
- Not specified
12. Bibliografie zur deutschen Grammatik (BDG)
- Publisher:
- Institut für Deutsche Sprache
- Type:
- toolService
- Language:
- German
- Description:
- Online Bibliography, bibliographic database
- Rights:
- Not specified
13. BitPar
- Creator:
- Schmid, Helmut
- Publisher:
- University of Stuttgart
- Type:
- toolService
- Subject:
- parser
- Description:
- Statistical parser
- Rights:
- Not specified
14. BNF Converter
- Publisher:
- Språkbanken, Dept. of Swedish Language, Göteborg University
- Type:
- toolService
- Subject:
- compiler construction and grammar
- Description:
- The BNF Converter is a compiler construction tool generating a compiler front-end from a Labelled BNF grammar.
- Rights:
- Not specified
15. BulTreeBank Morphological Analyzer
- Creator:
- Simov, Kiril and Osenova, Petya
- Publisher:
- Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
- Type:
- toolService
- Description:
- It is used morphological lexicon of Bulgarian (100 000 lemmas) compiled as a finite-state automaton in CLaRK System. It requires the text to be first tokenized and it is applied in each token. Includes also guessers for unknown words and Named Entities gazetteers. If the corresponding resources are available for a different language, then it can be tuned to it.
- Rights:
- Not specified
16. BulTreeBank Morphosyntactic Disambiguator
- Creator:
- Simov, Kiril, Osenova, Petya, and Simov, Alex
- Publisher:
- Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
- Type:
- toolService
- Description:
- This is a hybrid system: rules, neural network, rules. First rules for the sure cases are applied, then a neural network disambiguator is applied, then rules for repairing of the most frequent errors of the neural network. The rules are implemented as constraints in CLaRK System. The neural network is additional module implemented in Java. It is called CLaRK. It requires the morphologically annotated input.
- Rights:
- Not specified
17. BulTreeBank Tokenizer
- Creator:
- Simov, Kiril
- Publisher:
- Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
- Type:
- toolService
- Description:
- The tokenizer is covering all languages that use Latin1, Laitn2, Latin3 and Cyrillic tables of Unicode. Can be extended to cover other tables in Unicode if necessary. The implementation is as a cascaded regular grammar in CLaRK. It recognizes over 60 token categories. It is easy to be adapted to new token categories.
- Rights:
- Not specified
18. BUSCANEO
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Language:
- Catalan and Spanish
- Description:
- Tool for neologism extraction.
- Rights:
- Not specified
19. Bústia Neològica Escolar
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Language:
- Catalan and Spanish
- Description:
- Terminology management
- Rights:
- Not specified
20. Bwananet
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Language:
- Catalan, English, and Spanish
- Description:
- Tool for querying the Technical Corpus of the Institut Universitari de Lingüística Aplicada.
- Rights:
- Not specified
21. calcular_p_cue_class
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- Statistical analysis service: It calculates P(cue|class): probability of seeing a linguistic cue given a lexical class. This probability is computed given the occurrences of cues in a corpus (codified in the signatures file) and the information of belonging or not belonging of these words to different classes (codified in indicators file). The probability is computed for each studied cue in the signatures file and for each class in the indicators file.
- Rights:
- Not specified
22. Catalan Annotated Corpora CQP
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- This RESTful service allows to define a sub-corpus from different annotated corpora. The service includes a POS tag harmonisation process where original tags are converted to EAGLES/Parole format. The eventual sub-corpus is indexed using the IMS CWB tool. The user receives an ID which can be used by the CQP service to exploit the sub-corpus.
- Rights:
- Not specified
23. Catalan Digital Press
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- This RESTful service accesses part of the Hemeroteca Digital de l’Arxiu Municipal de Girona (digital press archive from the Girona city council), specifically Catalan press from 2003. The service uses the SRU protocol.
- Rights:
- Not specified
24. catdoc
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- Format conversion service: Word .doc to .txt converter
- Rights:
- Not specified
25. Cercador NEOROM
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Language:
- Catalan and Spanish
- Description:
- Search engine for the neologisms database of the NEOROM network. The network collects neologisms used in the press written in Romance languages from 2005 onwards.
- Rights:
- Not specified
26. Cercador OBNEO
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- Search engine of the BOBNEO data bank, a database of neologisms present in the mass media in Spanish and Catalan, written and oral, from 1992.
- Rights:
- Not specified
27. Česílko
- Creator:
- Hajič, Jan, Kuboň, Vladislav, and Homola, Petr
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService
- Subject:
- machine translation and Czech-Slovak translation
- Language:
- Czech
- Description:
- Česílko is a tool enabling the fast and efficient translation from one source language into many target languages, which are mutually related.
- Rights:
- Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0), http://creativecommons.org/licenses/by-nc-nd/3.0/, and PUB
28. Česílko 2.0 Shallow Transfer RBMT framework (opensource version)
- Creator:
- Vičič, Jernej, Kuboň, Vladislav, and Homola, Petr
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- Shallow Parse, Shallow Transfer Rule-Based Machine Translation, stochastic ranker, related languages, and toolbox
- Description:
- The system Česílko (language data and software tools) was first developed as an answer to a growing need of translation and localisation from one source language to many target languages. The starting system belonged to the Shallow Parse, Shallow Transfer Rule-Based Machine Translation – (RBMT) paradigm and it was designed primarily for translation of related languages. The latest implementation of the system uses a stochastic ranker; so technically it belongs to the hybrid machine translation paradigm, using stochastic methods combined with the traditional Shallow Transfer RBMT methods. The system has been stripped of the accompanying language resources due to copyright restrictions. The data that is available is just for demonstrative purposes.
- Rights:
- Not specified
29. Cesilko Web Service for Weblicht
- Creator:
- Hajič, Jan, Kuboň, Vladislav, and Homola, Petr
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService and service
- Subject:
- machine translation
- Description:
- Weblicht integration of Cesilko (http://hdl.handle.net/11858/00-097C-0000-0006-AAFE-A)
- Rights:
- Not specified
30. Chared
- Creator:
- Pomikálek, Jan
- Publisher:
- Masaryk University, NLP Centre
- Type:
- toolService and tool
- Subject:
- character encoding, character encoding detection, charset, and unicode
- Language:
- English
- Description:
- Chared is a software tool which can detect character encoding of a text document provided the language of the document is known. The language of the text has to be specified as an input parameter so that the corresponding language model can be used. The package contains models for a wide range of languages (currently 57 --- covering all major languages). Furthermore, it provides a training script to learn models for additional languages using a set of user supplied sample html pages in the given language. The detection algorithm is based on determining similarity of byte trigrams vectors. In general, chared should be more accurate than other character encoding detection tools with no language constraints. This is an important advantage allowing precise character decoding needed for building large textual corpora. The tool has been used for building corpora in American Spanish, Arabic, Czech, French, Japanese, Russian, Tajik, and six Turkic languages consisting of 70 billions tokens altogether. Chared is an open source software, licensed under New BSD License and available for download (including the source code) at http://code.google.com/p/chared/. The research leading to this piece of software was published in POMIKÁLEK, Jan a Vít SUCHOMEL. chared: Character Encoding Detection with a Known Language. In Aleš Horák, Pavel Rychlý. RASLAN 2011. 5. vyd. Brno, Czech Republic: Tribun EU, 2011. od s. 125-129, 5 s. ISBN 978-80-263-0077-9. and PRESEMT, Lexical Computing Ltd
- Rights:
- BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB
31. CLaRK System - an XML-based system for Corpora Development
- Publisher:
- Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
- Type:
- toolService
- Subject:
- corpus development
- Description:
- The CLaRK System incorporates several technologies: - XML technology - Unicode - Cascaded Regular Grammars; - Constraints over XML Documents On the basis of these technologies the following tools are implemented: XML Editor, Unicode Tokeniser, Sorting tool, Removing and Extracting tool, Concordancer, XSLT tool, Cascaded Regular Grammar tool, etc. 1 Unicode tokenization In order to provide possibility for imposing constraints over the textual node and to segment them in meaningful way, the CLaRK System supports a user-defined hierarchy of tokenisers. At the very basic level the user can define a tokeniser in terms of a set of token types. In this basic tokeniser each token type is defined by a set of UNICODE symbols. Above this basic level tokenisers, the user can define other tokenisers, for which the token types are defined as regular expressions over the tokens of some other tokeniser, the so called parent tokeniser. 2 Regular Grammars The regular grammars are the basic mechanism for linguistic processing of the content of an XML document within the system. The regular grammar processor applies a set of rules over the content of some elements in the document and incorporates the categories of the rules back in the document as XML mark-up. The content is processed before the application of the grammar rules in the following way: textual nodes are tokenized with respect to some appropriate tokeniser, the element nodes are textualized on the basis of XPath expressions that determine the important information about the element. The recognized word is substituted by a new XML mark-up, which can or can not contain the word. 3 Constraints The constraints that we implemented in the CLaRK System are generally based on the XPath language. We use XPath expressions to determine some data within one or several XML documents and thus we evaluate some predicates over the data. There are two modes of using a constraint. In the first mode the constraint is used for validity check, similar to the validity check, which is based on DTD or XML schema. In the second mode, the constraint is used to support the change of the document in order it to satisfy the constraint. There are three types of constraints, implemented in the system: regular expression constraints, number restriction constraints, value restriction constraints. 4 Macro Language In the CLaRK System the tools support a mechanism for describing their settings. On the basis of these descriptions (called queries) a tool can be applied only by pointing to a certain description record. Each query contains the states of all settings and options which the corresponding tool has. Once having this kind of queries there is a special tool for combining and applying them in groups (macros). During application the queries are executed successively and the result from an application is an input for the next one. For a better control on the process of applying several queries in one we introduce several conditional operators. These operators can determine the next query for application depending on certain conditions. When a condition for such an operator is satisfied, the execution continues from a location defined in the operator. The mechanism for addressing queries is based on user defined labels. When a condition is not satisfied the operator is ignored and the process continues from the position following the operator. In this way constructions like IF-THEN-ELSE and WHILE-DO easily can be expressed. The system supports five types of control operators: IF (XPath): the condition is an XPath expression which is evaluated on the current working document. If the result is a non-empty node-set, non-empty string, positive number or true boolean value the condition is satisfied; IF NOT (XPath): the same kind of condition as the previous one but the approving result is negated; IF CHANGED: the condition is satisfied if the preceding operation has changed the current working document or has produced a non-empty result document (depending on the operation); IF NOT CHANGED: the condition is satisfied if either the previous operation did not change the working document or did not produce a non-empty result. GOTO: unconditional changing the execution position. Each macro defined in the system can have its own query and can be incorporated in another macro. In this way some limited form of subroutine can be implemented. The new version of CLaRK will support server applications, calls to/from external programs.
- Rights:
- Not specified
32. CLaRK System - XML-based system for Corpora Development
- Creator:
- Simov, Kiril, Simov, Alex, and Kouylekov, Milen
- Publisher:
- Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
- Type:
- toolService
- Description:
- The CLaRK System incorporates several technologies: - XML technology - Unicode - Cascaded Regular Grammars; - Constraints over XML Documents On the basis of these technologies the following tools are implemented: XML Editor, Unicode Tokeniser, Sorting tool, Removing and Extracting tool, Concordancer, XSLT tool, Cascaded Regular Grammar tool, etc. 1 Unicode tokenization In order to provide possibility for imposing constraints over the textual node and to segment them in meaningful way, the CLaRK System supports a user-defined hierarchy of tokenisers. At the very basic level the user can define a tokeniser in terms of a set of token types. In this basic tokeniser each token type is defined by a set of UNICODE symbols. Above this basic level tokenisers, the user can define other tokenisers, for which the token types are defined as regular expressions over the tokens of some other tokeniser, the so called parent tokeniser. 2 Regular Grammars The regular grammars are the basic mechanism for linguistic processing of the content of an XML document within the system. The regular grammar processor applies a set of rules over the content of some elements in the document and incorporates the categories of the rules back in the document as XML mark-up. The content is processed before the application of the grammar rules in the following way: textual nodes are tokenized with respect to some appropriate tokeniser, the element nodes are textualized on the basis of XPath expressions that determine the important information about the element. The recognized word is substituted by a new XML mark-up, which can or can not contain the word. 3 Constraints The constraints that we implemented in the CLaRK System are generally based on the XPath language. We use XPath expressions to determine some data within one or several XML documents and thus we evaluate some predicates over the data. There are two modes of using a constraint. In the first mode the constraint is used for validity check, similar to the validity check, which is based on DTD or XML schema. In the second mode, the constraint is used to support the change of the document in order it to satisfy the constraint. There are three types of constraints, implemented in the system: regular expression constraints, number restriction constraints, value restriction constraints. 4 Macro Language In the CLaRK System the tools support a mechanism for describing their settings. On the basis of these descriptions (called queries) a tool can be applied only by pointing to a certain description record. Each query contains the states of all settings and options which the corresponding tool has. Once having this kind of queries there is a special tool for combining and applying them in groups (macros). During application the queries are executed successively and the result from an application is an input for the next one. For a better control on the process of applying several queries in one we introduce several conditional operators. These operators can determine the next query for application depending on certain conditions. When a condition for such an operator is satisfied, the execution continues from a location defined in the operator. The mechanism for addressing queries is based on user defined labels. When a condition is not satisfied the operator is ignored and the process continues from the position following the operator. In this way constructions like IF-THEN-ELSE and WHILE-DO easily can be expressed. The system supports five types of control operators: IF (XPath): the condition is an XPath expression which is evaluated on the current working document. If the result is a non-empty node-set, non-empty string, positive number or true boolean value the condition is satisfied; IF NOT (XPath): the same kind of condition as the previous one but the approving result is negated; IF CHANGED: the condition is satisfied if the preceding operation has changed the current working document or has produced a non-empty result document (depending on the operation); IF NOT CHANGED: the condition is satisfied if either the previous operation did not change the working document or did not produce a non-empty result. GOTO: unconditional changing the execution position. Each macro defined in the system can have its own query and can be incorporated in another macro. In this way some limited form of subroutine can be implemented. The new version of CLaRK will support server applications, calls to/from external programs.
- Rights:
- Not specified
33. COLDIC
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- Tool for dictionary management
- Rights:
- Not specified
34. CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)
- Creator:
- Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- coreference resolution, CorPipe, and CorefUD
- Language:
- Catalan, Czech, German, English, Spanish, French, Hungarian, Lithuanian, Norwegian Bokmål, Norwegian Nynorsk, Polish, Russian, and Turkish
- Description:
- The `corpipe23-corefud1.1-231206` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 23 (https://github.com/ufal/crac2023-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no _corpus id_ on input), so it can be used to predict coreference in any `mT5` language (for zero-shot evaluation, see the paper). However, note that the empty nodes must be present already on input, they are not predicted (the same settings as in the CRAC23 shared task).
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
35. Corpus query for Estonian corpora
- Publisher:
- University of Tartu
- Type:
- toolService
- Language:
- Estonian
- Description:
- Web application for querying the automatically morphologically disambiguated Mixed corpus of Estonian
- Rights:
- Not specified
36. Corpus Work Bench CWB (CQP)
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- This SOAP service implements the IMS Open Corpus Workbench (CWB), a collection of open-source tools for managing and querying large text corpora (ranging from 10 million to 2 billion words) with linguistic annotations. Its central component is the flexible and efficient query processor CQP. The service makes it possible to index a new corpus and query it.
- Rights:
- Not specified
37. CorpusExplorer
- Creator:
- Rüdiger, Jan Oliver
- Publisher:
- Jan Oliver Rüdiger
- Type:
- tool and toolService
- Subject:
- Corpus Linguisitics, NLP, conll, tei, XML, nlp, Natural Language Processing, linguistics, Linguistics, Computational Linguistics, corpus processing, tagger, POS tagger, lemmatization, text cleaning, CommonCrawl, epub, JSON, Twitter, Pandoc, Wikipedia, digital data, DTA, DSpin, MySQL, ElasticSearch, TextGrid, text corpora, TigerXML, and WeblichtXML
- Language:
- German, English, French, Italian, Dutch, Spanish, Polish, Arabic, Chinese, and Portuguese
- Description:
- Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks such as text acquisition, cleaning or tagging are completely automated. The simple interface supports the use in university teaching and leads users/students to fast and substantial results. The CorpusExplorer is open for many standards (XML, CSV, JSON, R, etc.) and also offers its own software development kit (SDK). Source code available at https://github.com/notesjor/corpusexplorer2.0
- Rights:
- Not specified
38. Croatian Lemmatization Server
- Publisher:
- University of Zagreb, Faculty of Humanities and Social Sciences
- Type:
- toolService
- Language:
- Croatian
- Description:
- On line service for lemmatization, full POS or MSD tagging of Croatian texts.
- Rights:
- Not specified
39. CST's lemmatiser
- Publisher:
- Center for Sprogteknologi, University of Copenhagen
- Type:
- toolService
- Language:
- Danish, Dutch, English, German, Modern Greek (1453-), Icelandic, Norwegian, Russian, Slovenian, and Swedish
- Description:
- 1) Fully automatic rule based lemmatization of inflected languages 2) Fully automatic training of lemmatization rules based on full form-lemma list
- Rights:
- Not specified
40. CST's lemmatizer
- Creator:
- Jongejan, Bart
- Publisher:
- Københavns Universitet, Center for Sprogteknologi (CST)
- Type:
- toolService
- Description:
- 1) Fully automatic rule based lemmatization of inflected languages 2) Fully automatic training of lemmatization rules based on full form-lemma list
- Rights:
- Not specified
41. CUBBITT Translation Models (en-cs) (v1.0)
- Creator:
- Popel, Martin, Tomková, Markéta, Tomek, Jakub, Kaiser, Łukasz, Uszkoreit, Jakob, Bojar, Ondřej, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- machine translation, neural machine translation, transformer, and cubbitt
- Language:
- English and Czech
- Description:
- CUBBITT En-Cs translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2014 (BLEU): en->cs: 27.6 cs->en: 34.4 (Evaluated using multeval: https://github.com/jhclark/multeval)
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
42. CUBBITT Translation Models (en-fr) (v1.0)
- Creator:
- Popel, Martin, Tomková, Markéta, Tomek, Jakub, Kaiser, Łukasz, Uszkoreit, Jakob, Bojar, Ondřej, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- machine translation, neural machine translation, transformer, and cubbitt
- Language:
- English and French
- Description:
- CUBBITT En-Fr translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2014 (BLEU): en->fr: 38.2 fr->en: 36.7 (Evaluated using multeval: https://github.com/jhclark/multeval)
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
43. CUBBITT Translation Models (en-pl) (v1.0)
- Creator:
- Popel, Martin, Tomková, Markéta, Tomek, Jakub, Kaiser, Łukasz, Uszkoreit, Jakob, Bojar, Ondřej, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- machine translation, neural machine translation, transformer, and cubbitt
- Language:
- English and Polish
- Description:
- CUBBITT En-Pl translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2020 (BLEU): en->pl: 12.3 pl->en: 20.0 (Evaluated using multeval: https://github.com/jhclark/multeval)
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
44. Cyril Belica : Kookkurrenzdatenbank CCDB
- Publisher:
- Institut für Deutsche Sprache
- Type:
- toolService
- Language:
- German
- Description:
- A co-occurrence database, developed by the Institut fuer Deutsche Sprache, for research in the field of collocation analysis in modern German. The database holds over 200,000 analysed words that can be browsed or searched and shown in context.
- Rights:
- Not specified
45. Czech image captioning, machine translation, and sentiment analysis (Neural Monkey models)
- Creator:
- Libovický, Jindřich, Rosa, Rudolf, Helcl, Jindřich, and Popel, Martin
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- suiteOfTools and toolService
- Subject:
- sentiment analysis, machine translation, image captioning, neural networks, transformer, and Neural Monkey
- Language:
- Czech and English
- Description:
- This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving three NLP tasks: machine translation, image captioning, and sentiment analysis. The models are trained on standard datasets and achieve state-of-the-art or near state-of-the-art performance in the tasks. The models are described in the accompanying paper. The same models can also be invoked via the online demo: https://ufal.mff.cuni.cz/grants/lsd There are several separate ZIP archives here, each containing one model solving one of the tasks for one language. To use a model, you first need to install Neural Monkey: https://github.com/ufal/neuralmonkey To ensure correct functioning of the model, please use the exact version of Neural Monkey specified by the commit hash stored in the 'git_commit' file in the model directory. Each model directory contains a 'run.ini' Neural Monkey configuration file, to be used to run the model. See the Neural Monkey documentation to learn how to do that (you may need to update some paths to correspond to your filesystem organization). The 'experiment.ini' file, which was used to train the model, is also included. Then there are files containing the model itself, files containing the input and output vocabularies, etc. For the sentiment analyzers, you should tokenize your input data using the Moses tokenizer: https://pypi.org/project/mosestokenizer/ For the machine translation, you do not need to tokenize the data, as this is done by the model. For image captioning, you need to: - download a trained ResNet: http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz - clone the git repository with TensorFlow models: https://github.com/tensorflow/models - preprocess the input images with the Neural Monkey 'scripts/imagenet_features.py' script (https://github.com/ufal/neuralmonkey/blob/master/scripts/imagenet_features.py) -- you need to specify the path to ResNet and to the TensorFlow models to this script Feel free to contact the authors of this submission in case you run into problems!
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
46. Czech image captioning, machine translation, sentiment analysis and summarization (Neural Monkey models)
- Creator:
- Libovický, Jindřich, Rosa, Rudolf, Helcl, Jindřich, and Popel, Martin
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- suiteOfTools and toolService
- Subject:
- sentiment analysis, machine translation, image captioning, neural networks, transformer, Neural Monkey, and summarization
- Language:
- Czech and English
- Description:
- This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving four NLP tasks: machine translation, image captioning, sentiment analysis, and summarization. The models are trained on standard datasets and achieve state-of-the-art or near state-of-the-art performance in the tasks. The models are described in the accompanying paper. The same models can also be invoked via the online demo: https://ufal.mff.cuni.cz/grants/lsd In addition to the models presented in the referenced paper (developed and published in 2018), we include models for automatic news summarization for Czech and English developed in 2019. The Czech models were trained using the SumeCzech dataset (https://www.aclweb.org/anthology/L18-1551.pdf), the English models were trained using the CNN-Daily Mail corpus (https://arxiv.org/pdf/1704.04368.pdf) using the standard recurrent sequence-to-sequence architecture. There are several separate ZIP archives here, each containing one model solving one of the tasks for one language. To use a model, you first need to install Neural Monkey: https://github.com/ufal/neuralmonkey To ensure correct functioning of the model, please use the exact version of Neural Monkey specified by the commit hash stored in the 'git_commit' file in the model directory. Each model directory contains a 'run.ini' Neural Monkey configuration file, to be used to run the model. See the Neural Monkey documentation to learn how to do that (you may need to update some paths to correspond to your filesystem organization). The 'experiment.ini' file, which was used to train the model, is also included. Then there are files containing the model itself, files containing the input and output vocabularies, etc. For the sentiment analyzers, you should tokenize your input data using the Moses tokenizer: https://pypi.org/project/mosestokenizer/ For the machine translation, you do not need to tokenize the data, as this is done by the model. For image captioning, you need to: - download a trained ResNet: http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz - clone the git repository with TensorFlow models: https://github.com/tensorflow/models - preprocess the input images with the Neural Monkey 'scripts/imagenet_features.py' script (https://github.com/ufal/neuralmonkey/blob/master/scripts/imagenet_features.py) -- you need to specify the path to ResNet and to the TensorFlow models to this script The summarization models require input that is tokenized with Moses Tokenizer (https://github.com/alvations/sacremoses) and lower-cased. Feel free to contact the authors of this submission in case you run into problems!
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
47. Czech Morphological Analyzer v1
- Creator:
- Hajič, Jan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService and service
- Subject:
- morphological analysis and lemmatization
- Language:
- Czech
- Description:
- One of the very first steps in automatic processing of Czech text is morphological analysis and lemmatization.
- Rights:
- Not specified
48. Czech PDT-C 1.0 Model for UDPipe 2 (2023-11-16)
- Creator:
- Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- tokenizer, POS tagger, lemmatization, parser, dependency parser, MorfFlex CZ 2.0, and PDT-C 1.0
- Language:
- Czech
- Description:
- Tokenizer, POS Tagger, Lemmatizer, and Parser model based on the PDT-C 1.0 treebank (https://hdl.handle.net/11234/1-3185). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#czech_pdtc1.0_model . To use these models, you need UDPipe version 2.1, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
49. Dendrarium
- Publisher:
- Institute of Computer Science, Polish Academy of Sciences
- Type:
- toolService
- Description:
- Coordinates work of a group of linguists selecting appropriate parse trees from many generated ones. Assigns parts of the task, signalling differences in annotation and allowing them to be corrected by a supervisor.
- Rights:
- Not specified
50. Depfix: Automatic Post-editing of SMT
- Creator:
- Rosa, Rudolf
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- machine translation, post-editing, Treex, morphology, and parsing
- Language:
- English and Czech
- Description:
- Depfix, a tool for Automatic Post-editing of SMT. See the project website for more information.
- Rights:
- GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB
51. Dialogy.Org
- Creator:
- Peterek, Nino
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService and service
- Subject:
- multimedia corpora search service
- Description:
- The Dialogy.Org system allows users to search in transcribed audio-visual corpora. The Dialogy.Org works on the principle of web-based interface, so installation of additional programs on your computer is not necessary. You must have Flash Player for playing audio or video recordings. and This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
- Rights:
- Not specified
52. Digital archive of Finnish Folk Tunes
- Publisher:
- Department of Music, University of Jyväskylä
- Type:
- toolService
- Language:
- Finnish
- Description:
- Digitalized versions of Finnish folk tunes and their relevant details (notation, key, meter, place of collection, lyrics, collector), 8613 Finnish folk tunes (including part of the lyrics)
- Rights:
- Not specified
53. DiSi: Flexible Dialogue System
- Publisher:
- Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)
- Type:
- toolService
- Language:
- Spanish
- Description:
- Dialogue manager
- Rights:
- Not specified
54. Dspace modifications for use of EPIC handles
- Creator:
- Pajas, Petr
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService
- Subject:
- DSpace, handle, and EPIC
- Description:
- Modifications to DSpace made by Petr Pajas in order to support pidconsortium.eu PID handle system instead of the default handle.com system used by DSpace.
- Rights:
- BSD 2-Clause "Simplified" or "FreeBSD" license, http://opensource.org/licenses/BSD-2-Clause, and PUB
55. DTAG dependency treebank tool
- Publisher:
- Copenhagen Business School
- Type:
- toolService
- Description:
- DTAG is a versatile annotation tool that supports manual and semi-automatic annotation of a wide range of linguistic phenomena, including the annotation of syntax, discourse, coreference, morphology, and word and phrase alignments. It includes commands for editing general labeled graphs and graph alignments, comparing annotations, managing annotation tasks, and interfacing with a revision control system. Its visualization component can display graphs and alignments for entire texts in a compact format, with a highly flexible and configurable formatting scheme. It also provides a powerful search-replace mechanism with queries based on full first-order logic, which can be used to search for linguistic constructions and automatically apply graph transformations to collections of annotated graphs. The visualization component does not currently support characters outside the ISO-latin character set.
- Rights:
- Not specified
56. DZ Interset
- Creator:
- Zeman, Daniel
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService and tool
- Subject:
- morphology, NLP, and Perl
- Description:
- DZ Interset is a means of converting among various tag sets in natural language processing. The core idea is similar to interlingua-based machine translation. DZ Interset defines a set of features that are encoded by the various tag sets. The set of features should be as universal as possible. It does not need to encode everything that is encoded by any tag set but it should encode all information that people may want to access and/or port from one tag set to another. New tag sets are attached by writing a driver for them. Once the driver is ready, you can easily convert tags between the new set and any other set for which you also have a driver. This reusability is an obvious advantage over writing a targeted conversion procedure each time you need to convert between a particular pair of tag sets. and grant MSM 0021620838 of the Ministry of Education of the Czech Republic
- Rights:
- GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB
57. EFCL Channelizer
- Creator:
- Klusáček, David
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- Fast Channelizer, Filterbank, ASR Front End, Software Defined Radio, Polyphase Filter, Frequency Multiplexing, Audio Denoising, High Performance Computing, HPC, SDR, FFT, FFTW, SIMD, AVX, SSE, and NEON
- Description:
- Extremely fast digital audio channelizer implementation, usable as a building block for experimental ASR front-ends or signal denoising applications. Also applicable in software defined radios, due to its high throughput. It comes in a form of a C/C++ library and an executable example program which reads input stream, splitting it into equidistant frequency channels, emitting their data to the output. Features: (1) Hand tuned SIMD-aware assembly for x86 (SSE) and IA64 (AVX) as well as for ARM (NEON) processors. (2) Generic non-SIMD C++ implementation for other architectures. (3) Capable of taking advantage of multicore CPUs. (4) Fully configurable number of channels and the output decimation rate. (5) User supplied FIR of the channel separation filter, which allows to specify the width of the channels, whether they should overlap or be separated. (6) Input and output signal samples are treated as complex numbers. (7) Speed over 750 complex MS/s achieved on Core i7 4710HQ @ 2.5GHz, when channelizing into 72 output channels with a FIR length of 1152 samples, using 3 computing threads. (8) Runs under Linux OS.
- Rights:
- Mozilla Public License 2.0, http://opensource.org/licenses/MPL-2.0, and PUB
58. ELAN
- Publisher:
- Max Planck Institute for Psycholinguistics
- Type:
- toolService
- Description:
- Multimodal annotation tool
- Rights:
- Not specified
59. ElixirFM
- Creator:
- Smrž, Otakar, Bielický, Viktor, and Buckwalter, Tim
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService
- Subject:
- Arabic morphology and ElixirFM
- Language:
- Arabic
- Description:
- ElixirFM is a high-level implementation of Functional Arabic Morphology documented at http://elixir-fm.wiki.sourceforge.net/. The core of ElixirFM is written in Haskell, while interfaces in Perl support lexicon editing and other interactions.
- Rights:
- http://opensource.org/licenses/GPL-3.0
60. Ellogon
- Type:
- toolService
- Description:
- Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. Ellogon as a language engineering platform offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.
- Rights:
- Not specified
61. EMU Speech Database System
- Publisher:
- Institute of Phonetics and Speech Processing, LMU Munich
- Type:
- toolService
- Description:
- EMU is a collection of software tools for the creation, manipulation and analysis of speech databases. At the core of EMU is a database search engine which allows the researcher to find various speech segments based on the sequential and hierarchical structure of the utterances in which they occur. EMU includes an interactive labeller which can display spectrograms and other speech waveforms, and which allows the creation of hierarchical, as well as sequential, labels for a speech utterance.
- Rights:
- Not specified
62. English-Latvian SMT system
- Publisher:
- Institute of Mathematics and Computer Science, University of Latvia
- Type:
- toolService
- Language:
- English
- Description:
- English-Latvian factored SMT system uses Moses decoder, trained on JRC-Acquis and some other parallel texts
- Rights:
- Not specified
63. English-Lithuanian Machine Translation Service
- Publisher:
- Center of Computational Linguistics, Vytautas Magnus University
- Type:
- toolService
- Language:
- English and Lithuanian
- Description:
- On-line freely accessible machine translation tool for translating English webpages or texts into Lithuanian.
- Rights:
- Not specified
64. Estació Terminus
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Language:
- Catalan and Spanish
- Description:
- Tool for terminology management.
- Rights:
- Not specified
65. ESTEN
- Publisher:
- Centre de Terminologia TERMCAT and Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Language:
- Catalan
- Description:
- Terminology management.
- Rights:
- Not specified
66. Estonian Text-to-Speech Synthesiser for the Blind
- Publisher:
- Laboratory of Phonetics and Speech Technology, Tallinn University of Technology
- Type:
- toolService
- Rights:
- Not specified
67. Etymological Reference Database
- Publisher:
- The Research Institute for the Languages of Finland
- Type:
- toolService
- Language:
- Finnish
- Rights:
- Not specified
68. EvaLatin 2020 models for UDPipe 2 (2020-08-31)
- Creator:
- Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- POS tagger, lemmatization, and tagger
- Language:
- Latin
- Description:
- POS Tagger and Lemmatizer models for EvaLatin2020 data (https://github.com/CIRCSE/LT4HALA). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#evalatin20_models . To use these models, you need UDPipe version at least 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
69. EVALD 1.0
- Creator:
- Rysová, Kateřina, Mírovský, Jiří, Novák, Michal, and Rysová, Magdaléna
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and native speakers
- Language:
- Czech
- Description:
- EVALD 1.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.
- Rights:
- BSD 2-Clause "Simplified" or "FreeBSD" license, http://opensource.org/licenses/BSD-2-Clause, and PUB
70. EVALD 1.0 for Foreigners
- Creator:
- Rysová, Kateřina, Mírovský, Jiří, Novák, Michal, and Rysová, Magdaléna
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and non-native speakers
- Language:
- Czech
- Description:
- EVALD 1.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.
- Rights:
- BSD 2-Clause "Simplified" or "FreeBSD" license, http://opensource.org/licenses/BSD-2-Clause, and PUB
71. EVALD 2.0
- Creator:
- Novák, Michal, Rysová, Kateřina, Mírovský, Jiří, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, UFAL
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and native speakers
- Language:
- Czech
- Description:
- EVALD 2.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
72. EVALD 2.0 for Foreigners
- Creator:
- Novák, Michal, Rysová, Kateřina, Mírovský, Jiří, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, UFAL
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and non-native speakers
- Language:
- Czech
- Description:
- EVALD 2.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
73. EVALD 3.0 – Evaluator of Discourse
- Creator:
- Mírovský, Jiří, Novák, Michal, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, UFAL
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and native speakers
- Language:
- Czech
- Description:
- EVALD 3.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
74. EVALD 3.0 for Foreigners – Evaluator of Discourse
- Creator:
- Mírovský, Jiří, Novák, Michal, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, UFAL
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and non-native speakers
- Language:
- Czech
- Description:
- EVALD 3.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
75. EVALD 4.0 – Evaluator of Discourse
- Creator:
- Novák, Michal, Mírovský, Jiří, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and non-native speakers
- Language:
- Czech
- Description:
- EVALD 4.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
76. EVALD 4.0 for Beginners – Evaluator of Discourse
- Creator:
- Novák, Michal, Mírovský, Jiří, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and non-native speakers
- Language:
- Czech
- Description:
- EVALD 4.0 for Beginners is a software that serves for automatic evaluation of Czech texts written by non-native speakers of Czech – language beginners.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
77. EVALD 4.0 for Foreigners – Evaluator of Discourse
- Creator:
- Novák, Michal, Mírovský, Jiří, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and non-native speakers
- Language:
- Czech
- Description:
- EVALD 4.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
78. eXist
- Publisher:
- Språkbanken, Dept. of Swedish Language, Göteborg University
- Type:
- toolService
- Description:
- eXist-db is an open source database management system entirely built on XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing.
- Rights:
- Not specified
79. Extract
- Creator:
- Forsberg, Markus and Ranta, Aarne
- Publisher:
- Språkbanken, Dept. of Swedish Language, Göteborg University
- Type:
- toolService
- Subject:
- morphology extraction
- Description:
- Extract is a tool for supervised morphological lexicon extraction from raw text data.
- Rights:
- Not specified
80. Fairytale child
- Creator:
- Rosa, Rudolf
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService and tool
- Subject:
- dialogue system, morphological generation, Treex, morphological analysis, and interactive
- Language:
- English and Czech
- Description:
- Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible. The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible. Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo. Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104. It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
- Rights:
- GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB
81. Fairytale child (2014-09-26)
- Creator:
- Rosa, Rudolf
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService and tool
- Subject:
- dialogue system, morphological generation, Treex, morphological analysis, and interactive
- Language:
- English and Czech
- Description:
- Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible. The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible. Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo. Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104. It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
- Rights:
- GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB
82. Fairytale child (2014-09-30)
- Creator:
- Rosa, Rudolf
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService and tool
- Subject:
- dialogue system, morphological generation, Treex, morphological analysis, and interactive
- Language:
- English and Czech
- Description:
- Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible. The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible. Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo. Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104. It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
- Rights:
- GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB
83. Fairytale child (2014-11-21)
- Creator:
- Rosa, Rudolf
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService and tool
- Subject:
- dialogue system, morphological generation, Treex, morphological analysis, and interactive
- Language:
- English and Czech
- Description:
- Fairytale Child is a simple chatbot trying to simulate a curious child. It asks the user to tell a fairy tale, often interrupting to ask for details and clarifications. However, it remembers what it was told and tries to show it if possible. The chatbot can communicate in Czech and in English. It analyzes the morphology of each sentence produced by the user with natural language processing tools, tries to identify potential questions to ask, and then asks one. A morphological generator is employed to generate correctly inflected sentences in Czech, so that the resulting sentences sound as natural as possible. Pohádkové dítě je jednoduchý chatbot, simulující zvídavé dítě. Požádá uživatele, aby mu vyprávěl pohádku, ale často ho přerušuje, aby se zeptal na detaily a vysvětlení. Pamatuje si ale, co mu uživatel řekl, a snaží se to pokud možno dát najevo. Chatbot umí komunikovat česky a anglicky. Analyzuje tvarosloví každé uživatelovy věty pomocí NLP nástrojů, pokusí se nalézt chodnou otázku, a tu pak položí. Aby tvořené české věty zněly co nejpřirozeněji, využívá se pro skloňování tvaroslovný generátor. and The work has been supported by GAUK 1572314 and SVV 260104. It has been using language resources developed, stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2010013).
- Rights:
- GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB
84. Feature-based tagger
- Creator:
- Hajič, Jan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService
- Subject:
- morphology and tagger
- Description:
- The Feature-based (exponential model) Tagger is a fast implementation of the Czech tagger developed at UFAL and described in the PDT 1.0 documentation (Czech Language Tagging page). In order to get the best possible results, the tagger requires preprocessing by a Czech morphological module with a very high coverage. This module covers a superset of the Czech "FM" morphology. Both the morphological module and the tagger are supplied as binary executables, together with all necessary precompiled Czech data. Input must be in the ISO Latin 2 (iso-8859-2) code and follow the csts.dtd definition, and output is produced in the same way (ISO Latin 2 code, csts.dtd). (As is the case with many of the tools provided with PDT 1.0, both executables also accept - and then produce - a "simplified SGML", which is not a real, valid SGML, but simply contains at least the tags for words, punctuation, and sentence breaks, one item per line.)
- Rights:
- PDT 2.0 License, https://lindat.mff.cuni.cz/repository/xmlui/page/license-pdt2, and ACA
85. Fine-Tracker
- Publisher:
- Centre for Language and Speech Technology, Radboud University
- Type:
- toolService
- Description:
- Computational model of human word recognition; Fine-phonetic detail
- Rights:
- Not specified
86. FOLKER
- Publisher:
- Institut für Deutsche Sprache
- Type:
- toolService
- Description:
- Audio transcription editor used for the construction of the FOLK corpus
- Rights:
- Not specified
87. ForFun 1.0
- Creator:
- Mikulová, Marie and Bejček, Eduard
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- service and toolService
- Subject:
- form, function, database, and syntax
- Language:
- Czech
- Description:
- ForFun is a database of linguistic forms and their syntactic functions built with the use of the multi-layer annotated corpora of Czech, the Prague Dependency Treebanks. The purpose of the Prague Database of Forms and Functions (ForFun) is to help the linguists to study the form-function relation, which we assume to be one of the principal tasks of both theoretical linguistics and natural language processing. A prototypical question to be asked is "What purposes does a preposition 'po' serve for" or "What are the linguistic means in the sentence that can express the meaning 'a destination of an action'?". There are almost 1500 distinct forms (besides the 'po' preposition) and 65 distinct functions (besides the 'destination').
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
88. freeling
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- Web service consisting of the Freeling open source language analysis tool suite.
- Rights:
- Not specified
89. FreeLing
- Publisher:
- Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)
- Type:
- toolService
- Language:
- Catalan, English, Galician, Italian, Portuguese, and Welsh
- Description:
- Open source language analysis tool suite: tokenizer, stemmer/lemmatizer, named entity recognizer, chunker/segmenter, morphosyntactic tagger, syntactic tagger, corpus processer, morphological tagger, semantic tagger, analyzer, Word Sense Disambiguator.
- Rights:
- Not specified
90. freeling_dependency
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- Freeling-based dependency parser.
- Rights:
- Not specified
91. freeling_morpho
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- Freeling-based morphological analyzer.
- Rights:
- Not specified
92. freeling_parsed
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- Freeling-based shallow parser.
- Rights:
- Not specified
93. freeling_tagging
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- Freeling-based part-of-speech tagger.
- Rights:
- Not specified
94. freeling_tokenizer
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Description:
- Freeling-based text tokenizer.
- Rights:
- Not specified
95. Frequency list: Early Modern Finnish
- Publisher:
- The Research Institute for the Languages of Finland
- Type:
- toolService
- Subject:
- word frequencies
- Language:
- Finnish
- Description:
- Frequency list of the Corpus of Early Modern Finnish, 4 862 190 words
- Rights:
- Not specified
96. Frequency list: Old Literary Finnish
- Publisher:
- The Research Institute for the Languages of Finland
- Type:
- toolService
- Language:
- Finnish
- Description:
- Frequency list of the Corpus of Old Literary Finnish, 3 425 382 words
- Rights:
- Not specified
97. Functional Morphology
- Creator:
- Forsberg, Markus and Ranta, Aarne
- Publisher:
- Språkbanken, Dept. of Swedish Language, Göteborg University
- Type:
- toolService
- Subject:
- morphology
- Description:
- Functional Morphology is a development environment for computational morphologies.
- Rights:
- Not specified
98. GATE-ANNIE
- Publisher:
- University of Sheffield
- Type:
- toolService
- Description:
- GATE-ANNIE, developed by the GATE group at the University of Sheffield (http;//www.gate.ac.uk; Cunningham et al., 2002,) is an Information Extraction (IE) web service for English. It consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, coreference resolver and named entity recogniser. The named entity recogniser identifies and categorizes entity names (such as persons, organizations, and location names), temporal expressions (dates and times), and certain types of numerical expressions (monetary values and percentages). GATE-ANNIE returns the fully annotated document in GATE XML format. The file saved by the client contains ANNIE's output in the default AnnotationSet and the input document's HTML or XML mark-up in the "Original markups" AnnotationSet. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02).
- Rights:
- Not specified
99. GATE-ANNIE-RDF
- Publisher:
- University of Sheffield
- Type:
- toolService
- Description:
- ANNIE-RDF developed by the GATE group at the University of Sheffield (http;//www.gate.ac.uk; Cunningham et al., 2002) is an Information Extraction (IE) web service for English. It consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, coreference resolver and named entity recogniser. The named entity recogniser identifies and categorizes entity names (such as persons, organizations, and location names), temporal expressions (dates and times), and certain types of numerical expressions (monetary values and percentages). The text spans and annotations are exported into an RDF-XML ontology, in which the recognized named entities are instances according to the PROTON ontology (http://proton.semanticweb.org/). H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02).
- Rights:
- Not specified
100. Gesprächanalytisches Informationssystem (GAIS)
- Publisher:
- Institut für Deutsche Sprache
- Type:
- toolService
- Language:
- German
- Description:
- web-based information system on scientific community (news, events, persons, job market, mailing list, database on research projects and corpora, bibliography, glossary and links) and recording equipment/software; disciplinary scope: research on conversation and discourse analysis and spoken language
- Rights:
- Not specified
- « Previous
- Next »
- 1
- 2
- 3
- 4