Multi-Lingual OCR Service

Note that this service is currently under active development and may change without warning. This note will be removed once the service stabilises

This services application uses optical character recognition (OCR) to identify text contained within images. This is a multi-lingual service and not restricted to Latin scripts. It works in three stages. First it determines the bounding boxes of related text within the image. Secondly it extracts the text from within each bounding box, before finally determining the language of the extracted text.

Supported image formats are: JPEG, PNG, GIF, TIFF or WebP.

There are two ways to call this service. The first allows you to pass in the URL of an image to process, whilst the second allows you to upload an image to the service. Both of these approaches require calling the following endpoint (calls can be authenticated using a GATE Cloud API Key via Basic Authentication):

To process an image available at a publicly accessible URL then you can simple make a GET request passing the URL as the url query parameter.

If instead you wish to upload an image for processing then you need to POST the file contents to the endpoint instead. The maximum supported file size is 1MB.

By default the service will auto-detect the predominant script of the text within an image. In general this works reliably, but you can also explicitly set the script via a script query parameter. Currently supported values for this param are the ISO 15924 codes shown in the following table:

value	description
auto	Auto Detect (Single Script)
loop	Auto Detect (Multiple Scripts)
arab	Arabic
beng	Bengali-Assamese
cyrl	Cyrillic
deva	Devanagari
hans	Han (simplified variant)
hant	Han (traditional variant)
hang	Hangul
jpan	Han, Hiragana, and Katakana
latn	Latin
taml	Tamil
telu	Telugu

Regardless of exactly how you call the service the response will take the same form. Specifically the service returns a JSON object containing information on the main script detected (which is used to drive the OCR) and an array of bounding boxes within the image where text was detected. An example of such an object would be

{
  "script": {
    "code": "latn",
    "name": "Latin",
    "probability": 0.9970128536224365
  },
  "bounding_boxes": [
    {
      "text": "Some text extracted from the image",
      "bounding_box": [[18,10],[1038,10],[1038,114],[18,114]],
      "language": {
        "code": "en",
        "name": "English",
        "probability": 0.6472072005271912
      },
      "script": {
        "code": "latn",
        "name": "Latin",
        "probability": 0.9970128536224365
      }
    }
  ]
}

Note that the probability for the scripts will be set to -1 if the script was explicitly provided as input to the process, i.e. was not auto-detected.

Test this service

Whilst this service is primarily intended to be used as a REST API the following form allows you to test the service and see the results and information produced.

150 free requests / day

Batch processing not available

Use this service

You can make up to 150 API calls per day free of charge, at an average rate of 2 calls/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API base URL for this service is:

https://cloud-api.gate.ac.uk/process/ml-ocr

See the API documentation for more details.

Create API Key