Rights: Not specified - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Rights Not specified Date Unknown

61. Cercador OBNEO

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Search engine of the BOBNEO data bank, a database of neologisms present in the mass media in Spanish and Catalan, written and oral, from 1992.
Rights:: Not specified

62. Cesilko Web Service for Weblicht

Creator:: Hajič, Jan, Kuboň, Vladislav, and Homola, Petr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and service
Subject:: machine translation
Description:: Weblicht integration of Cesilko (http://hdl.handle.net/11858/00-097C-0000-0006-AAFE-A)
Rights:: Not specified

63. CLaRK System - an XML-based system for Corpora Development

Publisher:: Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:: toolService
Subject:: corpus development
Description:: The CLaRK System incorporates several technologies: - XML technology - Unicode - Cascaded Regular Grammars; - Constraints over XML Documents On the basis of these technologies the following tools are implemented: XML Editor, Unicode Tokeniser, Sorting tool, Removing and Extracting tool, Concordancer, XSLT tool, Cascaded Regular Grammar tool, etc. 1 Unicode tokenization In order to provide possibility for imposing constraints over the textual node and to segment them in meaningful way, the CLaRK System supports a user-defined hierarchy of tokenisers. At the very basic level the user can define a tokeniser in terms of a set of token types. In this basic tokeniser each token type is defined by a set of UNICODE symbols. Above this basic level tokenisers, the user can define other tokenisers, for which the token types are defined as regular expressions over the tokens of some other tokeniser, the so called parent tokeniser. 2 Regular Grammars The regular grammars are the basic mechanism for linguistic processing of the content of an XML document within the system. The regular grammar processor applies a set of rules over the content of some elements in the document and incorporates the categories of the rules back in the document as XML mark-up. The content is processed before the application of the grammar rules in the following way: textual nodes are tokenized with respect to some appropriate tokeniser, the element nodes are textualized on the basis of XPath expressions that determine the important information about the element. The recognized word is substituted by a new XML mark-up, which can or can not contain the word. 3 Constraints The constraints that we implemented in the CLaRK System are generally based on the XPath language. We use XPath expressions to determine some data within one or several XML documents and thus we evaluate some predicates over the data. There are two modes of using a constraint. In the first mode the constraint is used for validity check, similar to the validity check, which is based on DTD or XML schema. In the second mode, the constraint is used to support the change of the document in order it to satisfy the constraint. There are three types of constraints, implemented in the system: regular expression constraints, number restriction constraints, value restriction constraints. 4 Macro Language In the CLaRK System the tools support a mechanism for describing their settings. On the basis of these descriptions (called queries) a tool can be applied only by pointing to a certain description record. Each query contains the states of all settings and options which the corresponding tool has. Once having this kind of queries there is a special tool for combining and applying them in groups (macros). During application the queries are executed successively and the result from an application is an input for the next one. For a better control on the process of applying several queries in one we introduce several conditional operators. These operators can determine the next query for application depending on certain conditions. When a condition for such an operator is satisfied, the execution continues from a location defined in the operator. The mechanism for addressing queries is based on user defined labels. When a condition is not satisfied the operator is ignored and the process continues from the position following the operator. In this way constructions like IF-THEN-ELSE and WHILE-DO easily can be expressed. The system supports five types of control operators: IF (XPath): the condition is an XPath expression which is evaluated on the current working document. If the result is a non-empty node-set, non-empty string, positive number or true boolean value the condition is satisfied; IF NOT (XPath): the same kind of condition as the previous one but the approving result is negated; IF CHANGED: the condition is satisfied if the preceding operation has changed the current working document or has produced a non-empty result document (depending on the operation); IF NOT CHANGED: the condition is satisfied if either the previous operation did not change the working document or did not produce a non-empty result. GOTO: unconditional changing the execution position. Each macro defined in the system can have its own query and can be incorporated in another macro. In this way some limited form of subroutine can be implemented. The new version of CLaRK will support server applications, calls to/from external programs.
Rights:: Not specified

64. CLaRK System - XML-based system for Corpora Development

Creator:: Simov, Kiril, Simov, Alex, and Kouylekov, Milen
Publisher:: Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:: toolService
Description:: The CLaRK System incorporates several technologies: - XML technology - Unicode - Cascaded Regular Grammars; - Constraints over XML Documents On the basis of these technologies the following tools are implemented: XML Editor, Unicode Tokeniser, Sorting tool, Removing and Extracting tool, Concordancer, XSLT tool, Cascaded Regular Grammar tool, etc. 1 Unicode tokenization In order to provide possibility for imposing constraints over the textual node and to segment them in meaningful way, the CLaRK System supports a user-defined hierarchy of tokenisers. At the very basic level the user can define a tokeniser in terms of a set of token types. In this basic tokeniser each token type is defined by a set of UNICODE symbols. Above this basic level tokenisers, the user can define other tokenisers, for which the token types are defined as regular expressions over the tokens of some other tokeniser, the so called parent tokeniser. 2 Regular Grammars The regular grammars are the basic mechanism for linguistic processing of the content of an XML document within the system. The regular grammar processor applies a set of rules over the content of some elements in the document and incorporates the categories of the rules back in the document as XML mark-up. The content is processed before the application of the grammar rules in the following way: textual nodes are tokenized with respect to some appropriate tokeniser, the element nodes are textualized on the basis of XPath expressions that determine the important information about the element. The recognized word is substituted by a new XML mark-up, which can or can not contain the word. 3 Constraints The constraints that we implemented in the CLaRK System are generally based on the XPath language. We use XPath expressions to determine some data within one or several XML documents and thus we evaluate some predicates over the data. There are two modes of using a constraint. In the first mode the constraint is used for validity check, similar to the validity check, which is based on DTD or XML schema. In the second mode, the constraint is used to support the change of the document in order it to satisfy the constraint. There are three types of constraints, implemented in the system: regular expression constraints, number restriction constraints, value restriction constraints. 4 Macro Language In the CLaRK System the tools support a mechanism for describing their settings. On the basis of these descriptions (called queries) a tool can be applied only by pointing to a certain description record. Each query contains the states of all settings and options which the corresponding tool has. Once having this kind of queries there is a special tool for combining and applying them in groups (macros). During application the queries are executed successively and the result from an application is an input for the next one. For a better control on the process of applying several queries in one we introduce several conditional operators. These operators can determine the next query for application depending on certain conditions. When a condition for such an operator is satisfied, the execution continues from a location defined in the operator. The mechanism for addressing queries is based on user defined labels. When a condition is not satisfied the operator is ignored and the process continues from the position following the operator. In this way constructions like IF-THEN-ELSE and WHILE-DO easily can be expressed. The system supports five types of control operators: IF (XPath): the condition is an XPath expression which is evaluated on the current working document. If the result is a non-empty node-set, non-empty string, positive number or true boolean value the condition is satisfied; IF NOT (XPath): the same kind of condition as the previous one but the approving result is negated; IF CHANGED: the condition is satisfied if the preceding operation has changed the current working document or has produced a non-empty result document (depending on the operation); IF NOT CHANGED: the condition is satisfied if either the previous operation did not change the working document or did not produce a non-empty result. GOTO: unconditional changing the execution position. Each macro defined in the system can have its own query and can be incorporated in another macro. In this way some limited form of subroutine can be implemented. The new version of CLaRK will support server applications, calls to/from external programs.
Rights:: Not specified

65. CLIPS : corpora e lessici di italiano parlato e scritto

Publisher:: Università degli studi di Napoli Federico II
Type:: corpus
Language:: Italian
Description:: Audio files of about 100 hours of speech from 15 different cities in Italy. Various recordings are transcribed to read in PDF
Rights:: Not specified

66. COLDIC

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Tool for dictionary management
Rights:: Not specified

67. Collection of Latvian literature

Publisher:: Tilde
Type:: corpus
Language:: Latvian
Description:: Masterpieces of Latvian literature from the beginning of Latvian literature until first decades of 20th century
Rights:: Not specified

68. Cologne Digital Sanskrit Dictionaries

Publisher:: Institute of Indology and Tamil Studies, Cologne University
Type:: lexicalConceptualResource
Language:: Sanskrit
Description:: Sanskrit lexicons. The data is made available as scanned images of the works as well as a digitization of the scanned images, which permits computer-aided analyses and displays of the work. Can be downloaded or queried online.
Rights:: Not specified

69. COLT – The Bergen Corpus of London Teenage Language

Type:: corpus
Language:: English
Description:: British English (London); Spoken, general, age-specific dialect corpus; 500 000 words, 55 hrs of recording; POS, speaker/conversation metainfo
Rights:: Not specified

70. COMPARA : Portuguese - English parallel translation corpus

Type:: corpus
Language:: English and Portuguese
Description:: bi-directional parallel corpus based on an open-ended collection of Portuguese-English and English-Portuguese source-texts and translations. Searchable via the IMS Corpus Query Processor and the DISPARA interface
Rights:: Not specified

« Previous
Next »
1
2
3
4
5
6
7
8
9
10
11
…
49
50

61. Cercador OBNEO

62. Cesilko Web Service for Weblicht

63. CLaRK System - an XML-based system for Corpora Development

64. CLaRK System - XML-based system for Corpora Development

65. CLIPS : corpora e lessici di italiano parlato e scritto

66. COLDIC

67. Collection of Latvian literature

68. Cologne Digital Sanskrit Dictionaries

69. COLT – The Bergen Corpus of London Teenage Language

70. COMPARA : Portuguese - English parallel translation corpus

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Subject

Show values starting with

Type

Original context has metadata only

Harvested from