WeVerify OCR Service

This application identifies URLs in the input text and tries to process them with optical character recognition (OCR) to identify text in images. It creates URL annotations in the output with features to indicate success and the text identified or failure and the nature of the error (such as an invalid or unreachable URL or a valid URL that refers to something other than an image).

Default annotations
:URL URLs with the following additional features produced by the OCR service:
  • ocr_ok true or false to indicate the service's success,
  • ocr_text the output text (empty if unsuccessful),
  • ocr_error the server error message (empty if successful),
  • ocr_response the HTTP response code,
  • string the URL itself.
1,200 free requests / day
Larger batches GBP0.80 / CPU hour

Use this pipeline

Single documents

You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API endpoint for this pipeline is:

https://cloud-api.gate.ac.uk/process-document/ocr-service

Create API Key

Batches of documents

You can process any amount of data with this pipeline on a pay-as-you-go basis, for GBP0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.

Reserve a job