solar1090
dexter API
General notes
Data format
All data is transferred in JSON format encoded as UTF-8. JSON objects may contain keys not listed in this documentation, but these should not be relied upon unless documented. Note that strings originating from web sites may contain lone Unicode surrogates after JSON decoding - e.g. "\ud800"
- and this may need to be taken into account when processing report data.
Authentication
All API requests should use HTTP Basic Authentication with a username of dexter
and the API key as the password.
Errors
The following HTTP response codes may be returned:
200 OK
201 Created
204 No Content
400 Bad Request
401 Unauthorized
404 Not Found
405 Method Not Allowed
413 Payload Too Large
415 Unsupported Media Type
429 Too Many Requests
Errors will be accompanied by a response body as follows:
{
"errors": [ <error> ... ],
}
API calls
Reports
Queue a report
POST /reports
{
"callback": <url>,
"callbackId": <id>,
"config": <config>,
"lifetime": <integer>,
"metadata": <metadata>,
"requestedPages": <pages>,
"url": <url>,
}
Response:
201 Created
Location: /reports/<id>
{
"id": <id>,
"queued": <dateTime>,
}
The metadata
is optional additional data that is stored along with the report. Unlike the contents of reports themselves, the metadata can be updated later by the client.
requestedPages
is the maximum number of pages that will be tested. The number of pages examined will never be greater than this, but may be fewer (if the site contains fewer pages than this, or some other limit specified by the config
is hit first, e.g. maximum time taken to run report) or zero (e.g. if the initial URL for testing does not successfully return a page).
lifetime
is an optional number of days; the report has a guaranteed minimum lifetime of this many days from the point in time that it is complete. After the lifetime has passed, the report may be automatically deleted without warning, although it is not guaranteed that the report will be deleted as soon as the lifetime is up. The report should be deleted explicitly (see below) if a guarantee of its deletion is required. If the lifetime is not specified or is specified as zero then a default lifetime will be used - it is not guaranteed what the default will be but it will not be fewer than 30 days.
The callback
URL is used when the report is complete. The URL will receive a POST
request with the contents being the output of the "Fetch a report summary" API call as described below, for the status: "complete"
case. If the optional callbackId
parameter was supplied then it will be returned as part of the callback data. If the callback URL does not return a success status then it will be retried repeatedly for up to a period of 7 days, or lifetime
days, whichever is fewer.
List reports
GET /reports?status=<status>
Response:
{
"reports": {
<report-url>: { report status as per "Fetch a report summary" }
}
"truncated": true
}
Returns a list of the current status of all reports. If status
is omitted or empty then all reports will be returned, otherwise it can be one of "queued"
, "running"
, "callback"
or "complete"
to only return reports in that status. status
can be supplied multiple times to return reports in multiple statuses.
A maximum of 1,000 reports will be returned by any query. If the list of reports was truncated due to there being more than 1,000 to return then the value truncated
will be true, otherwise it will be absent.
Fetch a report summary
GET /reports/<id>
Response if the report is still queued waiting to run:
{
"id": <id>,
"callbackId": <id>,
"config": <config>,
"lifetime": <integer>,
"metadata": <metadata>,
"pages": 0,
"queued": <dateTime>,
"requestedPages": <integer>,
"status": "queued",
"url": <url>
}
The response if the report is currently running is as for "queued"
but with the following changes:
{
"pages": <integer>,
"start": <dateTime>,
"status": "running",
}
pages
is the number of pages that have been examined so far, and start
is the date and time that the report began running.
The response if the report has finished running is as for "queued"
but with the following changes:
{
"detail": <url>,
"finish": <dateTime>,
"pages": <integer>,
"start": <dateTime>,
"status": "callback",
"summary": <summary>
}
If the report has finished running and the callback URL has been successfully accessed, then the response is as for "callback"
but with the following changes:
{
"calledBack": <dateTime>,
"status": "complete"
}
"calledBack"
is the date and time at which the callback URL was successfully accessed. Note that this value will be absent if the report was moved to the "complete"
state due to the time limit expiring for retrying the callback URL.
For responses in the "callback"
and "complete"
states, the URL in detail
may be fetched to retrieve the full data of the report. Note that this URL does not require any authentication.
Update a report
PUT /reports/<id>
{
"metadata": <metadata>,
"pages": <integer>,
"summary": <summary>,
"detail": <url>
}
Response:
{ report status as per "Fetch a report summary" }
Note that, unless the user accessing the API is a queue runner, only the metadata of a report can be updated; all other values associated with a report, once it has reached status "complete"
, are immutable (excepting that the report may be deleted).
A queue runner user may update "pages"
to indicate progress during the report execution, and "summary"
and "detail"
to indicate that the report is complete.
Delete/unqueue a report
DELETE /reports/<id>
Response:
{ report status as per "Fetch a report summary" }
The report status
returned is the status before it was deleted - if the report was still waiting to run then it is simply removed from the queue; if the report was running then it is cancelled and any results discarded.
The number of pages indicated pages
is approximately the number of pages that were examined, i.e. the number of pages that will be billed. If the status was "queued"
then the number of pages will be zero. If the status was "running"
or "complete"
then the number of pages will be between zero and requestedPages
.
Note that if a report is deleted before it completes then its callback URL will not be called.
Data formats
Basic types
category
The category of a diagnostic. Current categories include "accessibility"
, "brand"
, "code"
, "email"
, "links"
, and "seo"
, but software must be prepared to receive other category names also.
dateTime
Date/times are stored as strings in RFC 3339 format; the timezone is always UTC. e.g. "2018-02-28T13:53:14.12Z"
. Note that the number of digits in the fractional part of the seconds may vary and the fractional part may be entirely absent.
severity
The severity level of a diagnostic - one of "debug"
, "info"
, "minor"
, "moderate"
, "serious"
, "critical"
, or "alert"
. "debug"
and "info"
do not indicate a fault, they just convey information. "alert"
indicates a fault so serious and urgent that it may warrant proactively alerting a user to the problem.
diagType
The type of a diagnostic - one of "file"
, "link"
, "transport"
or "url"
.
"file"
means that the diagnostic relates to the contents of the file - e.g. invalid HTML, or a corrupt image file.
"link
" means that the diagnostic relates to the way the URL or the file it points to was linked in this specific instance - e.g. a non-image file used as an image in an HTML page, or a link that contains a URL fragment identifier that could not be found in the target URL's contents.
"transport"
means that the diagnostic relates to the process of fetching the URL, but should not be considered a problem for pages that reference the URL - e.g. a valid HTTP response was returned but it contained a non-fatal problem such as an invalid Set-Cookie header.
"url"
means the diagnostic relates to the URL itself or the process of fetching the URL and should be considered a problem for pages that reference the URL - e.g. the URL is malformed, or the hostname could not be found in the DNS, or the TCP connection was refused, or an HTTP error status was returned.
id
IDs are opaque strings of between 1 and 255 printable non-whitespace ASCII characters (i.e. characters 33-126).
linkAs
The type of resource that a URL has been linked as. Possible values include, but are not limited to, "document"
, "object"
, "embed"
, "media"
, "font"
, "image"
, "script"
, "style"
.
metadata
A JSON object up to a serialized length of 64 kilobytes. Metadata is treated as opaque by the system and will not be interpreted in any way, except inasmuch as it will be checked to ensure it is a syntactically-valid JSON object.
status
The status of a report. One of "queued"
, "running"
or "complete"
.
template
A template
is a string that may contain substrings matching /{([a-z][a-z0-9_]*)?(:[a-z][a-z0-9_]*)?}/
, which should be substituted before being displayed to the user. The substring "{}"
should be replaced with just "{"
. The value to substitute should be looked up in the appropriate resolution path, which is a list of objects to search for a property of the name indicated by the first capture group. The first match in the resolution path is used, except that values in that are undefined
or null
should be ignored. Objects in the resolution path that are undefined
or null
should also be ignored.
The second capture group, if present, indicates formatting that should be applied to display the value. Defined formats are shown below; unrecognised formats should be ignored and the value displayed as a plain string.
:duration
- the value is a number of seconds (either integer or floating point) which should be displayed in a human-friendly way depending on its magnitude, e.g. "20ms", "45s", "2h", etc.
:size
- the value is a number of bytes, which should be displayed in a human-friendly way depending on its magnitude, e.g. by appending "B", "kB", "MB", etc as appropriate.
Example:
"Hello {name} you are {age:duration} and you like {cook} {food}"
resolve with resolution path:
[
undefined,
{
"cook": null,
"food": "eggs",
"name": "Michael"
},
null,
{
"age": 1262278080,
"cook": "fried",
"food": "ham"
}
]
will produce:
"Hello Michael you are 40y and you like fried eggs"
url
URLs are always stored in canonical form. This means, for example, that hostnames are lower-cased and converted to Unicode form - so café.example.com
would be stored in that form and not as café.EXAMPLE.COM
or xn--caf-dma.example.com
. Thus to determine if two URLs are the same, simple string comparison can be used.
Objects
box
{
"height": <integer>,
"width": <integer>,
"x": <integer>,
"y": <integer>
}
The bounding box of an element on the page.
diagnostic
{
"category": <string>,
"coords": [ <box> ],
"detail": <template>,
"extract": <string>,
"level": <severity>,
"image": <url>,
"message": <template>,
"module": <string>,
"name": <string>,
"page": <number>,
"parameters": <object>,
"selector": <string>,
"source": <sourceInfo>,
"subcategory": <string>,
"tag": <string>,
"type": <diagType>
}
A diagnostic. Templates should be resolved with a path that includes the parameters
property (if present), followed by the diagnostic
object itself, followed by the link
or urlData
object that contains this diagnostic.
detail
(if present) provides detailed information about the diagnostic in markdown format.
extract
(if present) shows a short extract of the document source surrounding the problem code.
message
is a short one-line description of the diagnostic.
module
is the name of the module that generated the diagnostic.
page
is the 1-based index of the page on which the diagnostic occurs, if that is a meaningful concept for the format of the source file (e.g. for PDF it is, for HTML it is not).
category
, subcategory
(if present), and name
together produce a unique identifier for this category of diagnostic.
If the diagnostic is under a urlData
object, and type
is "file"
, then (if present) selector
is a CSS selector, which identify the element or elements involved in the diagnostic, and coords
is an array which provides the bounding box on the page for each of those elements (excluding any that do not have a visible bounding box).
If the diagnostic is under a urlData
object, and type
is "file"
, then (if present) source
is a sourceInfo
object indicating which parts of the file source are relevant to the diagnostic.
image
is the URL of an image related to the diagnostic, if there is a related image (for example, if it relates to an HTML <img>
element).
tag
is the element name related to the diagnostic, if it relates to an HTML or other XML-like element. The value will always be in lower case.
Example:
{
"category": "link",
"level": "serious",
"message": "Broken link ({status} {message})",
"module": "link",
"name": "notfound",
"parameters": {
"status": 404,
"message": "File not found"
},
"type": "url"
}
error
{
"code": <string>,
"message": <string>
}
An error from the system API (as opposed to an error being reported about a site under test). code
is a short error code for programmatic use, and message
is a human-readable description of the error.
linkData
{
"accessibleDescription": <string>,
"accessibleName": <string>,
"as": <linkAs>,
"coords": [ <box> ],
"diagnostics": [ <diagnostic> ],
"extract": <string>,
"fragment": <string>,
"interaction": <boolean>,
"module": <string>,
"page": <integer>,
"redirect": <string>,
"rel": <string>,
"selector": <string>,
"tag": <string>,
"text": <string>
}
A link from one resource to another. All the properties are optional.
as
is a linkAs
string value, indicating the purpose for which the URL is being fetched - e.g. "image"
if it's an HTML <img>
tag. If as
is omitted this implies its value is document
.
extract
(if present) shows a short extract of the document source surrounding the code that creates the link.
fragment
is the fragment identifier of the link target. It does not include the leading #
.
interaction
is true
if the link requires user interaction before the URL will be fetched (e.g. a clickable hyperlink) or false
if the URL will be fetched automatically without user interaction (e.g. an inline image). If interaction
is omitted this implies its value is false
.
module
is the name of the module that detected the link.
page
is the 1-based index of the page on which the link occurs, if that is a meaningful concept for the format of the source file (e.g. for PDF it is, for HTML it is not).
redirect
is present if this link was a redirect. Possible values are "temporary"
for a temporary redirect such as 302, "permanent"
for a permanant redirect such as 301, or "unknown"
for some other type of redirect.
rel
has a meaning similar to the HTML rel
attribute on the a
element.
selector
is a CSS selector that identifies the element or elements on the page that created the link. coords
is an array which provides the bounding box on the page for each of those elements.
tag
is the element name that produced the link, if it was produced by an HTML or other XML-like element link. The value will always be in lower case.
text
is the displayed clickable text of a hypertext link. accessibleName
is the "accessible name" of the link, and accessibleDescription
is the "accessible description". Neither of these is included if they are the empty string, and accessibleName
is not included if is identical to text
(ignoring differences in white-space).
Example:
<a href="http://example.com#main" title="Example">click here</a>
{
"fragment": "main",
"interaction": true,
"selector": "a[href=http://example.com#main]",
"tag": "a",
"text": "click here"
}
Example:
<link rel=stylesheet href="main.css">
{
"as": "style",
"selector": "link[rel=stylesheet][href=main.css]",
"tag": "link"
}
report
{
"data": <object>,
"pages": [ <url> ],
"summary": <summary>,
"urls": { <url>: <urlData> }
}
The full output of a report. summary
is the report summary that can be fetched separately via the API above. pages
is a list of the URLs of the HTML pages that were successfully fetched and tested, in the order that they were visited. Each page url can be looked up in the urls
object. urls
is a mapping from URLs to the results of testing those URLs. data
is a mapping from module names to objects containing data recorded by that module. The data recorded depends on the module in question.
sourceInfo
{
"firstByte": <integer>,
"firstColumn": <integer>,
"firstLine": <integer>,
"lastByte": <integer>,
"lastColumn": <integer>,
"lastLine": <integer>
}
Identifies the section of a text-based file (e.g. HTML) relevant to a diagnostic.
summary
{
"base": <url>,
"config": <config>,
"engine": {
"browser": <string>,
"name": "dexter",
"version": <string>
},
"finish": <dateTime>,
"limits": [ <string> ],
"pages": <integer>,
"pageTypes": { <string>: <integer> },
"requestedPages": <integer>,
"start": <dateTime>,
"urls": <integer>
}
The summary data about a report.
base
is the base URL of the report.
start
and finish
are the start and finish times of the test.
pages
is the number of pages that were successfully fetched and tested; urls
is the number of URLs that were fetched (or attempted to be fetched). pageTypes
indicates how many of each type of page (e.g. "html", "pdf") were tested.
urlData
{
"contentTested": <boolean>,
"data": <object>,
"description": <string>,
"diagnostics": [ <diagnostic> ],
"finish": <dateTime>,
"initialTime": <integer>,
"links": { <url>: [ <linkData> ] },
"location": <url>,
"mimeParams": { <string>: <string> },
"mimeType": <string>,
"ok": <boolean>,
"page": <boolean>
"pageLoadTime": <integer>,
"screenshot": {
"additional": {
<string>: {
"height": <integer>,
"storage": <url>,
"width": <integer>
}
},
"height": <integer>,
"storage": <url>,
"width": <integer>
},
"sha3Hash": <string>,
"size": <integer>,
"start": <dateTime>,
"storage": <url>,
"title": <string>,
"totalTime": <integer>,
"transportSize": <integer>,
}
The results of testing a URL. All properties are optional except start
and finish
, which are the start and finish times of the testing of the URL.
If contentTested
is true then the contents of the resource itself were checked; otherwise the link checker merely checked the URL was not returning an error.
data
is a mapping from module names to objects containing data recorded by that module. The data recorded depends on the module in question.
description
is the metadata "description" of the resource, if appropriate and present - e.g. the <meta name=description>
tag in HTML.
diagnostics
is a list of diagnostics found relating directly to this URL.
initialTime
is the time-to-first-byte of the URL fetch, totalTime
is the time-to-last-byte. pageLoadTime
is the time to load the URL as a page in the browser (from the start of the page fetch until the 'load' event is fired). All these values are integers in milliseconds.
links
is a mapping indicating URLs that were found linked from this resource to a list of linkData
objects. The URLs have their fragments stripped; fragment information is included in the linkData
object. The value is a list because there may be multiple links (potentially of different types) from a resource to the same URL.
location
is only present if the URL is a redirect, and indicates the final target of the redirection - i.e. if URL A redirects to URL B which redirects to URL C, then the location
for URL A will be URL C. The immediate target of the redirection (i.e. URL B) will be available from the links
object.
mimeType
is the MIME type of the resource, if provided by the server - e.g. "text/html"
. It is always in lower case. mimeParams
contains other parameters from the MIME Content-Type. Not all parameters are guaranteed to be stored, but charset
will be stored if it was provided.
ok
is true
if the resource returned a 'success' HTTP response (2xx or 3xx), or false
if another HTTP status was received.
If page
is true then the URL counted towards the total of tested pages, and will be listed in the pages
property of the top-level report object.
screenshot
, if present, specifies a URL at which a width
x height
pixel screenshot of the page may be downloaded. storage
, if present, specifies a URL at which the original file contents can be downloaded. additional
, if present, can contain zero or more additional variant screenshots, e.g. of the page viewed while emulating a mobile device.
sha3Hash
, if present, is the SHA3-256 hash of the file contents as a hexadecimal string.
size
is the size of the resource in bytes. transportSize
is the size of the resource as transmitted over the network (e.g. if the resource was sent using gzip encoding this may be smaller than size
). If transportSize
is missing it is the same as size
.
title
is the title of the resource, if appropriate - e.g. the contents of the <title>
tag in HTML.
Configuration
Syntax
Configuration = ConfigItem ( ( ";" | "\n" ) ConfigItem )*
ConfigItem = Assignment | Condition "{" Configuration "}" | <empty>
Condition = URLPattern | RegExp
Assignment = Name | "!" Name | Name ( "=" | "+=" | "-=" ) Value
Name = ( Alpha | "_" ) ( Alpha | Digit | "_" )* ("." Name)*
Value = List | SingleValue
List = "[" SingleValue ( "," SingleValue )* "]"
SingleValue = Boolean | String | RegExp | Duration | Size | Integer
Whitespace (spaces and tabs) is allowed between syntactic tokens, except inside a Name
. Wherever whitespace is allowed, a comment block may be inserted with /* ... */
, or the rest of a line can be commented out with //
.
Configuration can be global, for all URLs in the test, or can be made conditional and only active for certain URLs. The conditions can be based upon URLPatterns or RegExps. For example:
!caseSensitive // treat all URLs as not case sensitive
http://example.com/private/* {
/* URLs under http://example.com/private
are case sensitive */
caseSensitive
}
/\.png$/ {
!include // completely exclude URLs ending in ".png" from the test
}
Conditional blocks can be nested, and rules inside the nested blocks will only apply to URLs that match all the nested conditions. For example:
http://example.org/* {
/\.png$/ {
/* exclude URLs ending in .png, but only if they are under
http://example.org/ */
!include
}
}
Boolean
Boolean values can be assigned either true
or false
. They can also be set to true
by just specifying the item name, or false by specifying an !
followed by the item name.
Examples:
base.startpoint = false
base.startpoint // set base.startpoint to true
!base.startpoint // set base.startpoint to false
String
Strings start and end with a '
, and can contain backslash escapes:
\\
: a literal \
\n
: a newline
\'
: a literal '
\xXX
: the character with code point XX (hexadecimal)
Examples:
'hello'
'i like \' apostrophes'
'i don\x27t like \\ backslashes'
RegExp
Regular expressions are the same as JavaScript regular expression literals. The only valid flag is i
for case insensitivity.
Example:
/[a-z]+/i
Duration
A duration in seconds. The value consists of one or more integers, each followed by a unit specifier - ms
for milliseconds, s
for seconds, m
for minutes, h
for hours, d
for days and w
for weeks.
Examples:
500ms // 500 milliseconds, i.e. 0.5 seconds
10m // 10 minutes, i.e. 600 seconds
1d8h // 1 day 8 hours, i.e. 115,200 seconds
Size
A size in bytes. The value consists of an integer, followed by an optional unit specifier (k
for kilobytes, M
for megabytes, or G
for gigabytes), followed by a B
.
Examples:
7B // 7 bytes
12kB // 12 kilobytes, i.e. 12,288 bytes
1MB // 1 megabyte, i.e. 1,048,576 bytes
Integer
A positive (or zero) integer. Leading zeroes are permitted.
URLPattern
Scheme '://' Host ( ':' Port )? '/' Path?
The scheme can be either http
, https
, or *
to indicate either.
The host can be a simple string, to match that hostname, or *.
followed by a simple string, to indicate that hostname and subdomains thereof, or .
to match the hostname of the base URL of the test, or *
to match any host.
The port can be omitted, to match the default port for the URL's scheme, or an integer value to match that specific port, or *
to match any port.
The path is a string to match the URL's whole path. It can contain *
characters as a wildcard.
Examples:
// matches any URL:
*://*:*/*
// matches any https: URL on port 443 at the base hostname of the test:
https://./*
// matches any URL with hostname 'example.com' and the default port:
*://example.com/*
// matches any URL with hostname 'example.com' or a subdomain thereof,
// e.g. 'www.example.com', and the default port:
*://*.example.com/*
// matches the exact URL http://example.com/foo/bar:
http://example.com/foo/bar
// matches any URL with a path beginning 'foo' and ending 'bar':
*://*:*/foo*bar