Data Management REST API
Contents
1. Introduction
This document describes the REST API provided by GATE Cloud to create, manage, upload and download persistent data sets on the GATE Cloud platform, referred to as data bundles. General information about GATE Cloud REST APIs can be found on this page.
2. List existing bundles
GET https://cloud.gate.ac.uk/api/data/bundle
Request a summary of all data bundles owned by the authenticating user.
Response format
XML | JSON |
<bundles> <bundle id="NN" name="Bundle name" url="https://cloud.gate.ac.uk/api/data/bundle/NN" downloadable="true" closed="true" /> ... </bundles> | [ { "id":NN, "name":"Bundle name", "url":"https://cloud.gate.ac.uk/api/data/bundle/NN", "downloadable":true, "closed":true }, ... ] |
The url can be queried to get more details about a specific bundle.
3. Details of a specific bundle
3.1. Query bundle details
GET https://cloud.gate.ac.uk/api/data/bundle/{bundleID}
Get the full details of a single data bundle.
Response format
XML | JSON |
<bundle> <key>value</key> ... <files> <file>{downloadURL}</file> ... </files> </job> | { "key":"value", ... "files":["downloadURL", ...] } |
Key | Value |
id | The bundle's identifier |
name | The bundle's display name |
url | The API URL from which the bundle's details can be fetched |
downloadable | (boolean) Does this bundle permit its contents to be downloaded directly? If false the bundle can only be used as input to an annotation job |
closed | (boolean) Is this bundle complete and ready for use (true) or is it still open for further files to be uploaded (false)? |
dateCreated | Date when this bundle was first created |
totalSize | The total amount of data in this bundle, in bytes |
monthlyPrice | The total monthly storage cost for this bundle. May be zero, for bundles that point to your own S3 bucket (i.e. you pay your own storage charges direct to Amazon) |
type | The type of files stored in this bundle, for bundles that are usable as input to an annotation job. The valid types are described in the job management API documentation |
encoding | The character encoding that an annotation job should use to read text documents out of this data bundle |
mimeTypeOverride | The MIME type that an annotation job should assume when reading documents from this bundle |
fileExtensions | (ZIP and TAR bundles only) Comma-separated list of file extensions identifying the entries within this bundle's archives that should be processed by an annotation job |
mimeTypeFilters | (ARC and WARC bundles only) Comma-separate list of MIME type prefixes that identify the entries in the archive that should be processed by an annotation job |
For full details of the type, encoding, mimeTypeOverride, fileExtensions and mimeTypeFilters options, see the job management API documentation.
Downloadable bundles also provide a list of URLs under the files property that can be used to download the bundle's contents. When retrieving these URLs you must follow all 3xx redirects.
3.2. Update bundle details
POST https://cloud.gate.ac.uk/api/data/bundle/{bundleID}
Modify the bundle details. Currently the only modifiable "detail" is the bundle's display name.
Request format
XML | JSON |
<bundle> <name>New name</name> </bundle> | { "name":"New name" } |
Response
Exactly as for the GET case above.
4. Creating a new bundle
There are two options to create a new data bundle:
- Upload a set of files from your local machine
- Point to a set of files that are already stored in your own bucket on Amazon S3.
4.1. Uploading files to a bundle
Creating a bundle from uploaded files is a three step process. First create the empty bundle, then add files one by one, and finally close the bundle. Note that all files uploaded to a bundle must be of the same kind (all ZIP files, all WARC files, etc.) and must share the same additional settings such as file extension filters.
POST https://cloud.gate.ac.uk/api/data/bundle
Request format
XML | JSON |
<bundle> <key>value</key> ... </bundle> | { "key":"value", ... } |
Key | Value |
name | The name for the new bundle |
type | The type of files stored in this bundle, for bundles that are usable as input to an annotation job. The valid types are described in the job management API documentation |
encoding | The character encoding that an annotation job should use to read text documents out of this data bundle |
mimeTypeOverride | The MIME type that an annotation job should assume when reading documents from this bundle |
fileExtensions | (ZIP and TAR bundles only) Comma-separated list of file extensions identifying the entries within this bundle's archives that should be processed by an annotation job |
mimeTypeFilters | (ARC and WARC bundles only) Comma-separate list of MIME type prefixes that identify the entries in the archive that should be processed by an annotation job |
For full details of the type, encoding, mimeTypeOverride, fileExtensions and mimeTypeFilters options, see the job management API documentation.
Response format
As for GET bundle details above, the details of the newly created bundle.
Once the bundle has been created you can upload files:
POST https://cloud.gate.ac.uk/api/data/bundle/{bundleID}/add
Request format
XML | JSON |
<add> <!-- file name extension should match the bundle type --> <fileName>archiveName.zip</fileName> </add> | { /* file name extension should match * the bundle type */ "fileName":"archiveName.zip" } |
Response format
XML | JSON |
<putUrl>https://....</putUrl> | { "putUrl":"https://...." } |
The putUrl is a URL to which you can upload the file using an HTTP PUT request. It will only work for a limited time, so you should upload your file to this URL immediately. A new URL is generated every time you post to the .../add URL, so if you find your PUT URL has expired simply POST again to generate a fresh one. When uploading a file to the putUrl the PUT request must have a Content-Type of "application/octet-stream" and the correct Content-Length, and must not have a Content-MD5 header.
Once all files have been successfully uploaded to their generated PUT URLs the bundle must be closed:
POST https://cloud.gate.ac.uk/api/data/bundle/{bundleID}/close
Request format
XML | JSON |
<action>close</action> | { "action":"close" } |
Response format
As for GET bundle details above, the details of the newly created bundle.
4.2. Creating a bundle to reference existing files on Amazon S3
Creating a bundle that references existing files in your own Amazon S3 bucket is a single-step operation. Note that all files referenced by a bundle must be of the same kind (all ZIP files, all WARC files, etc.), must be downloadable using the same credentials, and must share the same additional settings such as file extension filters.
POST https://cloud.gate.ac.uk/api/data/bundle
Request format
XML | JSON |
<bundle> <key>value</key> ... <accessKey>AKIA......</accessKey> <secretKey>...</secretKey> <locations> <location>s3://bucketName/key</location> ... </locations> </bundle> | { "key":"value", ... "accessKey":"AKIA......", "secretKey":"...", "locations":[ "s3://bucketName/key", ... ] } |
Key | Value |
name | The name for the new bundle |
type | The type of files stored in this bundle, for bundles that are usable as input to an annotation job. The valid types are described in the job management API documentation |
encoding | The character encoding that an annotation job should use to read text documents out of this data bundle |
mimeTypeOverride | The MIME type that an annotation job should assume when reading documents from this bundle |
fileExtensions | (ZIP and TAR bundles only) Comma-separated list of file extensions identifying the entries within this bundle's archives that should be processed by an annotation job |
mimeTypeFilters | (ARC and WARC bundles only) Comma-separate list of MIME type prefixes that identify the entries in the archive that should be processed by an annotation job |
For full details of the type, encoding, mimeTypeOverride, fileExtensions and mimeTypeFilters options, see the job management API documentation.
The S3 locations are specified as pseudo-URLs of the form s3://bucketName/key. The accessKey and secretKey parameters specify AWS credentials that should be used when GATE Cloud components need to fetch the files in the bundle. For security reasons you should not provide your AWS master credentials here, instead create an IAM user whose rights are restricted to GET requests on the objects in the bundle.
Response format
As for GET bundle details above, the details of the newly created bundle.
5. Deleting a data bundle
DELETE https://cloud.gate.ac.uk/api/data/bundle/{bundleID}
When you no longer require the data in a bundle you should delete it. For bundles that are stored in GATE Cloud managed storage you will be charged a monthly fee for each bundle based on its size, and you must delete the bundle to stop incurring charges.
For a bundle that points to objects in your own S3 bucket there are no monthly storage charges, and deleting the bundle will not delete the target objects.