URL Domain Analysis

An application that annotates URLs within a document, then assigns to each information on what multiple organisations who analyse the credibility of online content have said about the domain (or sometimes, domain path) in the URL. A full list of sources and data collection methods can be found here.

Default annotations
:URL ( [URL Object ] ) details on all URLs found in document. features (or json object values) are:
  :indices ([int, int]) start and end position in text where this URL is found
  :rule (string) to be ignored - used for debugging
  :category (string) from the POS tagger. indicates if token is NN, JJ etc. assigned URL for URL annotations
  :kind (string) type of token (word, number etc. assigned URL for URL annotation)
  :length (int) length of URL string
  :string (string) the URL itself
  :replaced (int) number of tokens merged to create URL
  :resolved-url (string) the URL once resolved to final location: in case URL entered redirects.
  :resolved-domain (string) domain of URL being evaluayed
:SourceCredibility ( [SourceCredibilityObject] ) details on all source credibility information if found for given URLs. features (or json object values) are:
  :indices ([int, int]) start and end position in text where this URL is found
  :rule (string) to be ignored - used for debugging
  :category (string) from the POS tagger. indicates if token is NN, JJ etc. assigned URL for URL annotations
  :kind (string) type of token (word, number etc. assigned URL for URL annotation)
  :length (int) length of URL string
  :string (string) the URL itself
  :replaced (int) number of tokens merged to create URL
  :resolved-url (string) the URL once resolved to final location: in case URL entered redirects.
  :source (string) the source from which this credibility information is taken
  :source-type (string) whether the source gives information on trusted media, untrusted media or both. these correspond to a value of "positive", "negative" or "neutral"
  :domainOrAccount (string) whether the credibility information in about a domain or a social media page( e.g: facebook, twitter). can be "domain" or "account"
  :updated (string, format: yyyyMMdd) when the data from this source was last updated
  :labels* (string) the labels given to the domain by this source
  :description (string) any extra information given by the source about the domain path
  :evidence* ([string]) usually urls giving potential evidence that the label given by the source is correct

(*note: :labels and :evidence are currently duplicated by the deprecated :type and :debunks respectively, which are to be removed in the next release)

1,200 free requests / day
Batch processing not available

Use this pipeline

Single documents

You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API endpoint for this pipeline is:

https://cloud-api.gate.ac.uk/process/credibility-full-urls

Create API Key