Korektor web service is available on
http(s)://lindat.mff.cuni.cz/services/korektor/api/
.
The web service is freely available for testing. Respect the CC BY-NC-SA licence of the models – explicit written permission of the authors is required for any commercial exploitation of the system. If you use the service, you agree that data obtained by us during such use can be used for further improvements of the systems at UFAL. All comments and reactions are welcome.
The Korektor REST API can be accessed directly or via any other web programming tools that support standard HTTP request methods and JSON for output handling.
Service Request | Description | HTTP Method |
---|---|---|
models | return list of models | GET/POST |
correct | correct given text according to chosen model | GET/POST |
suggestions | generate spelling suggestions of the given text according to chosen model | GET/POST |
Return the list of available models. The default model (used when user supplies no model to a method call) is also returned – this is guaranteed to be the latest Czech spellchecking model.
http://lindat.mff.cuni.cz/services/korektor/api/models |
The response object contains two fields models
(containing array
of existing model names) and default_model
(one of the models which
is used when no model is specified).
{ "models": [ "czech-spellchecker-130202", "czech-diacritics_generator-130202", "strip_diacritics-130202" ], "default_model": "czech-spellchecker-130202" }
Auto-correct the given text according to chosen model and return the corrected text as a string. The response format is described later.
Parameter | Mandatory | Data type | Description |
---|---|---|---|
data | yes | string | Input text in UTF-8. |
model | no | string | Model to use; see model selection for model matching rules. |
input | no | string (untokenized / untokenized_lines / segmented / vertical / horizontal ) | Input format to use; default is untokenized . |
http://lindat.mff.cuni.cz/services/korektor/api/correct?data=Přílyš žluťoučky kůň ůpěl ďábelské ódi. |
|
http://lindat.mff.cuni.cz/services/korektor/api/correct?data=Příliš žluťoučký kůň úpěl ďábelské ódy .&input=horizontal&model=strip_diacritics |
Generate spelling suggestions for the given text. For every located error, a list of suggestions is returned, from the most probable to the least probable. User can specify the limit on number of suggestions returned. The response format is described later.
Parameter | Mandatory | Data type | Description |
---|---|---|---|
data | yes | string | Input text in UTF-8. |
model | no | string | Model to use; see model selection for model matching rules. |
input | no | string (untokenized / untokenized_lines / segmented / vertical / horizontal ) | Input format to use; default is untokenized . |
suggestions | no | positive integer | The maximum number of suggestions to return for a single token. If unspecified, value 5 is used. |
http://lindat.mff.cuni.cz/services/korektor/api/suggestions?data=Přílyš žluťoučky kůň ůpěl ďábelské ódi. |
|
http://lindat.mff.cuni.cz/services/korektor/api/suggestions?data=Prilis zlutoucky kun upel dabelske ody.&model=czech-diacritics_generator&suggestions=3 |
The result
field in the response
format is an array of suggestions. Each suggestion is an array of strings,
whose first element is the original piece of text and the other elements (which
may or may not be present) are the suggestions, from the most probable to the
least probable. The concatenation of first elements of suggestions is equal to
the original text.
{ "model": "czech-spellchecker-130202", "acknowledgements": [ "http://ufal.mff.cuni.cz/korektor#korektor_acknowledgements", "https://ufal.mff.cuni.cz/korektor/users-manual#korektor-czech_acknowledgements" ], "result": [["Přílyš","Příliš"],[" "],["žluťoučky","žluťoučký","žluťoučké"],[" kůň "],["ůpěl","úpěl","pěl"],[" ďábelské "],["ódi","ódy","zdi"],["."]] }
The response format of all methods is
JSON. Except for the
models method, the output JSON has the following structure
(with result_object
being usually a string or an array):
{ "model": "Model used", "acknowledgements": ["URL with acknowledgements", ...], "result": result_object }
There are several possibilities how to select required model using
the model
option:
model
option is not specified, the default model
(returned by models method) is used – this is
guaranteed to be the latest Czech spellcheching model.model
option can specify one of the models returned
by the models method.-YYMMDD
format can be left out when
supplying model
option – the latest avilable model will be
used.model
option may be only several first words of model
name (for example czech
). In this case, the latest most suitable
model is used.curl
. Several examples follow:
curl --data-urlencode 'data=Přílyš žluťoučky kůň ůpěl ďábelské ódi.' http://lindat.mff.cuni.cz/services/korektor/api/correct
curl -F 'data=@input_file' http://lindat.mff.cuni.cz/services/korektor/api/suggestions
curl -F 'data=@input_file' -F 'model=czech-diacritics_generator' -F 'suggestions=3' http://lindat.mff.cuni.cz/services/korektor/api/suggestions
curl -F 'data=@input_file' http://lindat.mff.cuni.cz/services/korektor/api/correct | PYTHONIOENCODING=utf-8 python -c "import sys,json; sys.stdout.write(json.load(sys.stdin)['result'])"