URL Domain Analysis
An application that annotates URLs within a document, then assigns to each information on what multiple organisations who analyse the credibility of online content have said about the domain (or sometimes, domain path) in the URL. A full list of sources and data collection methods can be found here.
Note that this service does not rate the likely credibility of the specific URL provided, rather it
aggregates URLs into higher level "scopes" and reports on what the credibility information sources report about that
scope as a whole. For most URLs the scope is the domain of the URL, e.g. all URLs under
https://www.reuters.com
will report the same credibility information. However for certain
social media platforms where many different individuals and organizations have an independent presence,
the scope is the individual account name, such as x.com/BBC
or tiktok.com/@channel4
;
the intention is that the scope represents the level at which a single organization has overall editorial
control over the content published under that domain or account prefix.
The service currently provides account-scoped credibility information for X/Twitter, Facebook, TikTok, Telegram and VKontakte.
Default annotations | |||
:URL | ( [URL Object ] ) details on all URLs found in document. Features (or json object values) are: | ||
:indices | ([int, int]) start and end position in text where this URL is found | ||
:string | (string) the URL itself | ||
:resolved-url | (string) the URL once resolved to final location after following any redirects. | ||
:resolved-domain | (string) domain of URL being evaluated | ||
:credibility-scope | (string) scope of any credibility information about this URL, either a domain name (example.com ) or a domain-and-account prefix for supported social media platforms (x.com/BBC ) |
||
:SourceCredibility | ( [SourceCredibilityObject] ) details on all source credibility information if found for given URLs. Features (or json object values) are: | ||
:indices | ([int, int]) start and end position in text where this URL is found | ||
:string | (string) the URL itself | ||
:resolved-url | (string) the URL once resolved to final location | ||
:credibility-scope | (string) scope of any credibility information about this URL (see above) | ||
:source | (string) the source from which this credibility information is taken | ||
:source-type | (string) whether the source gives information on trusted media, media where it may be worth being wary or both. These correspond to a value of "positive", "caution" or "mixed" | ||
:domainOrAccount | (string) whether the credibility information in about a domain or a social media page( e.g: facebook, twitter). can be "domain" or "account" | ||
:updated | (string, format: yyyyMMdd) when the data from this source was last updated | ||
:labels* | (string) the labels given to the domain by this source | ||
:description | (string) any extra information given by the source about the domain path | ||
:evidence* | ([string]) usually urls giving potential evidence that the label given by the source is correct. comma separated. |
(*note: :labels and :evidence are currently duplicated by the deprecated :type and :debunks respectively, which are to be removed in the next release)
Example
Consider the following example URL:
https://bit.ly/3oZRRyt
The service will first resolve this short URL to its final location (at the time of writing this is a news article on https://www.dailytelegraph.com.au), recognises that "dailytelegraph.com.au" is the domain to look for and sends back the following:
{ "text": "https://bit.ly/3oZRRyt", "entities": { "URL": [ { "indices": [ 0, 22 ], "string": "https://bit.ly/3oZRRyt", "resolved-url": "https://www.dailytelegraph.com.au/subscribe/news/1/?sourceCode=DTWEB_WRE170_a&dest=https://www.dailytelegraph.com.au/news/nsw/treasurer-matt-kean-will-hand-down-the-halfyearly-budget-review/news-story/cdaa6d5cc30a6dbc38951f4d44d53b09&memtype=anonymous&mode=premium&v21=dynamic-cold-control-noscore&V21spcbehaviour=append", "resolved-domain": "www.dailytelegraph.com.au", "credibility-scope": "www.dailytelegraph.com.au" } ], "SourceCredibility": [ { "indices": [ 0, 22 ], "updated": "20240218", "source-type": "caution", "labels": "bias,rumor", "description": "australian tabloid mag", "type": "bias,rumor", "source": "OpenSources", "domainOrAccount": "domain", "string": "https://bit.ly/3oZRRyt", "resolved-url": "https://www.dailytelegraph.com.au/subscribe/news/1/?sourceCode=DTWEB_WRE170_a&dest=https://www.dailytelegraph.com.au/news/nsw/treasurer-matt-kean-will-hand-down-the-halfyearly-budget-review/news-story/cdaa6d5cc30a6dbc38951f4d44d53b09&memtype=anonymous&mode=premium&v21=dynamic-cold-control-noscore&V21spcbehaviour=append", "resolved-domain": "www.dailytelegraph.com.au", "credibility-scope": "www.dailytelegraph.com.au" }, { "indices": [ 0, 22 ], "description": "This domain has appeared in 1 GDI media market based report(s)", "debunks": "https://disinformationindex.org/wp-content/uploads/2021/09/GDI_QUT-Australia-Disinformation-Risk-Assessment-Report-21.pdf", "source-type": "mixed", "labels": "present in GDI news reports", "evidence": "https://disinformationindex.org/wp-content/uploads/2021/09/GDI_QUT-Australia-Disinformation-Risk-Assessment-Report-21.pdf", "type": "present in GDI news reports", "updated": "20231130", "resolved-domain": "www.dailytelegraph.com.au", "credibility-scope": "www.dailytelegraph.com.au", "source": "GDI-MMR", "domainOrAccount": "domain", "string": "https://bit.ly/3oZRRyt", "resolved-url": "https://www.dailytelegraph.com.au/subscribe/news/1/?sourceCode=DTWEB_WRE170_a&dest=https://www.dailytelegraph.com.au/news/nsw/treasurer-matt-kean-will-hand-down-the-halfyearly-budget-review/news-story/cdaa6d5cc30a6dbc38951f4d44d53b09&memtype=anonymous&mode=premium&v21=dynamic-cold-control-noscore&V21spcbehaviour=append" } ] } }
It has found information about the Daily Telegraph (AU) in data collected from OpenSources and the GDI and returned the relevant details in the domain analysis objects.
Use this pipeline
You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.
The API endpoint for this pipeline is: