Korektor

Korektor web service is available on http(s)://lindat.mff.cuni.cz/services/korektor/api/.

The web service is freely available for testing. Respect the CC BY-NC-SA licence of the models – explicit written permission of the authors is required for any commercial exploitation of the system. If you use the service, you agree that data obtained by us during such use can be used for further improvements of the systems at UFAL. All comments and reactions are welcome.

API Reference

The Korektor REST API can be accessed directly or via any other web programming tools that support standard HTTP request methods and JSON for output handling.

Service Request	Description	HTTP Method
models	return list of models	GET/POST
correct	correct given text according to chosen model	GET/POST
suggestions	generate spelling suggestions of the given text according to chosen model	GET/POST

Method models

Return the list of available models. The default model (used when user supplies no model to a method call) is also returned – this is guaranteed to be the latest Czech spellchecking model.

Browser Example

http://lindat.mff.cuni.cz/services/korektor/api/models

JSON Response

The response object contains two fields models (containing array of existing model names) and default_model (one of the models which is used when no model is specified).

Example JSON Response

{
 "models": [
  "czech-spellchecker-130202",
  "czech-diacritics_generator-130202",
  "strip_diacritics-130202"
 ],
 "default_model": "czech-spellchecker-130202"
}

Method correct

Auto-correct the given text according to chosen model and return the corrected text as a string. The response format is described later.

Parameter	Mandatory	Data type	Description
data	yes	string	Input text in UTF-8.
model	no	string	Model to use; see model selection for model matching rules.
input	no	string (`untokenized` / `untokenized_lines` / `segmented` / `vertical` / `horizontal`)	Input format to use; default is `untokenized`.

Browser Examples

http://lindat.mff.cuni.cz/services/korektor/api/correct?data=Přílyš žluťoučky kůň ůpěl ďábelské ódi.

http://lindat.mff.cuni.cz/services/korektor/api/correct?data=Příliš žluťoučký kůň úpěl ďábelské ódy .&input=horizontal&model=strip_diacritics

Method suggestions

Generate spelling suggestions for the given text. For every located error, a list of suggestions is returned, from the most probable to the least probable. User can specify the limit on number of suggestions returned. The response format is described later.

Parameter	Mandatory	Data type	Description
data	yes	string	Input text in UTF-8.
model	no	string	Model to use; see model selection for model matching rules.
input	no	string (`untokenized` / `untokenized_lines` / `segmented` / `vertical` / `horizontal`)	Input format to use; default is `untokenized`.
suggestions	no	positive integer	The maximum number of suggestions to return for a single token. If unspecified, value 5 is used.

Browser Examples

http://lindat.mff.cuni.cz/services/korektor/api/suggestions?data=Přílyš žluťoučky kůň ůpěl ďábelské ódi.

http://lindat.mff.cuni.cz/services/korektor/api/suggestions?data=Prilis zlutoucky kun upel dabelske ody.&model=czech-diacritics_generator&suggestions=3

Result Object

The result field in the response format is an array of suggestions. Each suggestion is an array of strings, whose first element is the original piece of text and the other elements (which may or may not be present) are the suggestions, from the most probable to the least probable. The concatenation of first elements of suggestions is equal to the original text.

Example JSON Response

{
 "model": "czech-spellchecker-130202",
 "acknowledgements": [
  "http://ufal.mff.cuni.cz/korektor#korektor_acknowledgements",
  "https://ufal.mff.cuni.cz/korektor/users-manual#korektor-czech_acknowledgements"
 ],
 "result": [["Přílyš","Příliš"],[" "],["žluťoučky","žluťoučký","žluťoučké"],[" kůň "],["ůpěl","úpěl","pěl"],[" ďábelské "],["ódi","ódy","zdi"],["."]]
}

Common Response Format

The response format of all methods is JSON. Except for the models method, the output JSON has the following structure (with result_object being usually a string or an array):

{
 "model": "Model used",
 "acknowledgements": ["URL with acknowledgements", ...],
 "result": result_object
}

Model Selection

There are several possibilities how to select required model using the model option:

If model option is not specified, the default model (returned by models method) is used – this is guaranteed to be the latest Czech spellcheching model.
The model option can specify one of the models returned by the models method.
Version info in the -YYMMDD format can be left out when supplying model option – the latest avilable model will be used.
The model option may be only several first words of model name (for example czech). In this case, the latest most suitable model is used.

Using Curl to Access the API

The described API can be comfortably used by curl. Several examples follow:

Passing Input on Command Line (if UTF-8 locale is being used)

curl --data-urlencode 'data=Přílyš žluťoučky kůň ůpěl ďábelské ódi.' http://lindat.mff.cuni.cz/services/korektor/api/correct

Using Files as Input (files must be in UTF-8 encoding)

curl -F 'data=@input_file' http://lindat.mff.cuni.cz/services/korektor/api/suggestions

Specifying Additional Parameters

curl -F 'data=@input_file' -F 'model=czech-diacritics_generator' -F 'suggestions=3' http://lindat.mff.cuni.cz/services/korektor/api/suggestions

Converting JSON Result to Plain Text

curl -F 'data=@input_file' http://lindat.mff.cuni.cz/services/korektor/api/correct | PYTHONIOENCODING=utf-8 python -c "import sys,json; sys.stdout.write(json.load(sys.stdin)['result'])"