This application uses optical character recognition(OCR) to identify text contained within images. There are two ways the service can be used.

The first, and more normal GATE Cloud approach, works by scanning a text document for URLs. Each URL found is then passed through the OCR system. A URL annotation is then created in the output with features to indicate success and the text identified or failure and the nature of the error (such as an invalid or unreachable URL or a valid URL that refers to something other than an image).

An alternative to processing plain text allows a single image file to be passed to the service. This is achieved by Base 64 encoding the image and posting the resulting data as plain text. The output from this approach is slightly different as we simply pass back any extracted text, rather than adding annotations over the possibly large input data.

You can experiment with this second approach using the following input field to select a local file. This will be loaded, Base 64 encoded, and the result put into the standard form below.

The Base 64 approach is mostly intended for use via the REST API. If you want to test the service with images then we have a dedicated OCR demo page which is easier to use.

Default annotations
:URL URLs with the following additional features produced by the OCR service:
  • ocr_ok true or false to indicate the service's success,
  • ocr_text the output text (empty if unsuccessful),
  • ocr_error the server error message (empty if successful),
  • ocr_response the HTTP response code,
  • string the URL itself.
150 free requests / day
Batch processing not available

Single documents

You can process up to 150 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API endpoint for this pipeline is:


