solar1090

dexter API

General notes

Data format

All data is transferred in JSON format encoded as UTF-8. JSON objects may contain keys not listed in this documentation, but these should not be relied upon unless documented. Note that strings originating from web sites may contain lone Unicode surrogates after JSON decoding - e.g. "\ud800" - and this may need to be taken into account when processing report data.

Authentication

All API requests should use HTTP Basic Authentication with a username of dexter and the API key as the password.

Errors

The following HTTP response codes may be returned:

200 OK
201 Created
204 No Content
400 Bad Request
401 Unauthorized
404 Not Found
405 Method Not Allowed
413 Payload Too Large
415 Unsupported Media Type
429 Too Many Requests

Errors will be accompanied by a response body as follows:

{
    "errors": [ <error> ... ],
}

API calls

Reports

Queue a report

POST /reports

{
    "callback": <url>,
    "callbackId": <id>,
    "config": <config>,
    "lifetime": <integer>,
    "metadata": <metadata>,
    "requestedPages": <pages>,
    "url": <url>,
}

Response:

201 Created
Location: /reports/<id>

{
    "id": <id>,
    "queued": <dateTime>,
}

The metadata is optional additional data that is stored along with the report. Unlike the contents of reports themselves, the metadata can be updated later by the client.

requestedPages is the maximum number of pages that will be tested. The number of pages examined will never be greater than this, but may be fewer (if the site contains fewer pages than this, or some other limit specified by the config is hit first, e.g. maximum time taken to run report) or zero (e.g. if the initial URL for testing does not successfully return a page).

lifetime is an optional number of days; the report has a guaranteed minimum lifetime of this many days from the point in time that it is complete. After the lifetime has passed, the report may be automatically deleted without warning, although it is not guaranteed that the report will be deleted as soon as the lifetime is up. The report should be deleted explicitly (see below) if a guarantee of its deletion is required. If the lifetime is not specified or is specified as zero then a default lifetime will be used - it is not guaranteed what the default will be but it will not be fewer than 30 days.

The callback URL is used when the report is complete. The URL will receive a POST request with the contents being the output of the "Fetch a report summary" API call as described below, for the status: "complete" case. If the optional callbackId parameter was supplied then it will be returned as part of the callback data. If the callback URL does not return a success status then it will be retried repeatedly for up to a period of 7 days, or lifetime days, whichever is fewer.

List reports

GET /reports?status=<status>

Response:

{
    "reports": {
        <report-url>: { report status as per "Fetch a report summary" }
    }
    "truncated": true
}

Returns a list of the current status of all reports. If status is omitted or empty then all reports will be returned, otherwise it can be one of "queued", "running", "callback" or "complete" to only return reports in that status. status can be supplied multiple times to return reports in multiple statuses.

A maximum of 1,000 reports will be returned by any query. If the list of reports was truncated due to there being more than 1,000 to return then the value truncated will be true, otherwise it will be absent.

Fetch a report summary

GET /reports/<id>

Response if the report is still queued waiting to run:

{
    "id": <id>,
    "callbackId": <id>,
    "config": <config>,
    "lifetime": <integer>,
    "metadata": <metadata>,
    "pages": 0,
    "queued": <dateTime>,
    "requestedPages": <integer>,
    "status": "queued",
    "url": <url>
}

The response if the report is currently running is as for "queued" but with the following changes:

{
    "pages": <integer>,
    "start": <dateTime>,
    "status": "running",
}

pages is the number of pages that have been examined so far, and start is the date and time that the report began running.

The response if the report has finished running is as for "queued" but with the following changes:

{
    "detail": <url>,
    "finish": <dateTime>,
    "pages": <integer>,
    "start": <dateTime>,
    "status": "callback",
    "summary": <summary>
}

If the report has finished running and the callback URL has been successfully accessed, then the response is as for "callback" but with the following changes:

{
    "calledBack": <dateTime>,
    "status": "complete"
}

"calledBack" is the date and time at which the callback URL was successfully accessed. Note that this value will be absent if the report was moved to the "complete" state due to the time limit expiring for retrying the callback URL.

For responses in the "callback" and "complete" states, the URL in detail may be fetched to retrieve the full data of the report. Note that this URL does not require any authentication.

Update a report

PUT /reports/<id>

{
    "metadata": <metadata>,
    "pages": <integer>,
    "summary": <summary>,
    "detail": <url>
}

Response:

{ report status as per "Fetch a report summary" }

Note that, unless the user accessing the API is a queue runner, only the metadata of a report can be updated; all other values associated with a report, once it has reached status "complete", are immutable (excepting that the report may be deleted).

A queue runner user may update "pages" to indicate progress during the report execution, and "summary" and "detail" to indicate that the report is complete.

Delete/unqueue a report

DELETE /reports/<id>

Response:

{ report status as per "Fetch a report summary" }

The report status returned is the status before it was deleted - if the report was still waiting to run then it is simply removed from the queue; if the report was running then it is cancelled and any results discarded.

The number of pages indicated pages is approximately the number of pages that were examined, i.e. the number of pages that will be billed. If the status was "queued" then the number of pages will be zero. If the status was "running" or "complete" then the number of pages will be between zero and requestedPages.

Note that if a report is deleted before it completes then its callback URL will not be called.

Data formats

Basic types

dateTime

Date/times are stored as strings in RFC 3339 format; the timezone is always UTC. e.g. "2018-02-28T13:53:14.12Z". Note that the number of digits in the fractional part of the seconds may vary and the fractional part may be entirely absent.

severity

The severity level of a diagnostic - one of "debug", "info", "minor", "moderate", "serious", "critical", or "alert". "debug" and "info" do not indicate a fault, they just convey information. "alert" indicates a fault so serious and urgent that it may warrant proactively alerting a user to the problem.

diagType

The type of a diagnostic - one of "file", "link", "transport" or "url".

"file" means that the diagnostic relates to the contents of the file - e.g. invalid HTML, or a corrupt image file.

"link" means that the diagnostic relates to the way the URL or the file it points to was linked in this specific instance - e.g. a non-image file used as an image in an HTML page, or a link that contains a URL fragment identifier that could not be found in the target URL's contents.

"transport" means that the diagnostic relates to the process of fetching the URL, but should not be considered a problem for pages that reference the URL - e.g. a valid HTTP response was returned but it contained a non-fatal problem such as an invalid Set-Cookie header.

"url" means the diagnostic relates to the URL itself or the process of fetching the URL and should be considered a problem for pages that reference the URL - e.g. the URL is malformed, or the hostname could not be found in the DNS, or the TCP connection was refused, or an HTTP error status was returned.

id

IDs are opaque strings of between 1 and 255 printable non-whitespace ASCII characters (i.e. characters 33-126).

linkAs

The type of resource that a URL has been linked as. Possible values include, but are not limited to, "document", "object", "embed", "media", "font", "image", "script", "style".

metadata

A JSON object up to a serialized length of 64 kilobytes. Metadata is treated as opaque by the system and will not be interpreted in any way, except inasmuch as it will be checked to ensure it is a syntactically-valid JSON object.

status

The status of a report. One of "queued", "running" or "complete".

template

A template is a string that may contain substrings matching /{([a-z][a-z0-9_]*)?(:[a-z][a-z0-9_]*)?}/, which should be substituted before being displayed to the user. The substring "{}" should be replaced with just "{". The value to substitute should be looked up in the appropriate resolution path, which is a list of objects to search for a property of the name indicated by the first capture group. The first match in the resolution path is used, except that values in that are undefined or null should be ignored. Objects in the resolution path that are undefined or null should also be ignored.

The second capture group, if present, indicates formatting that should be applied to display the value. Defined formats are shown below; unrecognised formats should be ignored and the value displayed as a plain string.

:duration - the value is a number of seconds (either integer or floating point) which should be displayed in a human-friendly way depending on its magnitude, e.g. "20ms", "45s", "2h", etc.

:size - the value is a number of bytes, which should be displayed in a human-friendly way depending on its magnitude, e.g. by appending "B", "kB", "MB", etc as appropriate.

Example:

"Hello {name} you are {age:duration} and you like {cook} {food}"

resolve with resolution path:

[
    undefined,
    {
        "cook": null,
        "food": "eggs",
        "name": "Michael"
    },
    null,
    {
        "age": 1262278080,
        "cook": "fried",
        "food": "ham"
    }
]

will produce:

"Hello Michael you are 40y and you like fried eggs"

url

URLs are always stored in canonical form. This means, for example, that hostnames are lower-cased and converted to Unicode form - so café.example.com would be stored in that form and not as café.EXAMPLE.COM or xn--caf-dma.example.com. Thus to determine if two URLs are the same, simple string comparison can be used.

Objects

box

{
    "height": <integer>,
    "width": <integer>,
    "x": <integer>,
    "y": <integer>
}

The bounding box of an element on the page.

diagnostic

{
    "category": <string>,
    "coords": [ <box> ],
    "detail": <template>,
    "extract": <string>,
    "level": <severity>,
    "image": <url>,
    "message": <template>,
    "module": <string>,
    "name": <string>,
    "page": <number>,
    "parameters": <object>,
    "selector": <string>,
    "source": <sourceInfo>,
    "subcategory": <string>,
    "tag": <string>,
    "type": <diagType>
}

A diagnostic. Templates should be resolved with a path that includes the parameters property (if present), followed by the diagnostic object itself, followed by the link or urlData object that contains this diagnostic.

detail (if present) provides detailed information about the diagnostic in markdown format.

extract (if present) shows a short extract of the document source surrounding the problem code.

message is a short one-line description of the diagnostic.

module is the name of the module that generated the diagnostic.

page is the 1-based index of the page on which the diagnostic occurs, if that is a meaningful concept for the format of the source file (e.g. for PDF it is, for HTML it is not).

category, subcategory (if present), and name together produce a unique identifier for this category of diagnostic.

If the diagnostic is under a urlData object, and type is "file", then (if present) selector is a CSS selector, which identify the element or elements involved in the diagnostic, and coords is an array which provides the bounding box on the page for each of those elements (excluding any that do not have a visible bounding box).

If the diagnostic is under a urlData object, and type is "file", then (if present) source is a sourceInfo object indicating which parts of the file source are relevant to the diagnostic.

image is the URL of an image related to the diagnostic, if there is a related image (for example, if it relates to an HTML <img> element).

tag is the element name related to the diagnostic, if it relates to an HTML or other XML-like element. The value will always be in lower case.

Example:

{
    "category": "link",
    "level": "serious",
    "message": "Broken link ({status} {message})",
    "module": "link",
    "name": "notfound",
    "parameters": {
        "status": 404,
        "message": "File not found"
    },
    "type": "url"
}

error

{
    "code": <string>,
    "message": <string>
}

An error from the system API (as opposed to an error being reported about a site under test). code is a short error code for programmatic use, and message is a human-readable description of the error.

linkData

{
    "accessibleDescription": <string>,
    "accessibleName": <string>,
    "as": <linkAs>,
    "coords": [ <box> ],
    "diagnostics": [ <diagnostic> ],
    "extract": <string>,
    "fragment": <string>,
    "interaction": <boolean>,
    "module": <string>,
    "page": <integer>,
    "redirect": <string>,
    "rel": <string>,
    "selector": <string>,
    "tag": <string>,
    "text": <string>
}

A link from one resource to another. All the properties are optional.

as is a linkAs string value, indicating the purpose for which the URL is being fetched - e.g. "image" if it's an HTML <img> tag. If as is omitted this implies its value is document.

extract (if present) shows a short extract of the document source surrounding the code that creates the link.

fragment is the fragment identifier of the link target. It does not include the leading #.

interaction is true if the link requires user interaction before the URL will be fetched (e.g. a clickable hyperlink) or false if the URL will be fetched automatically without user interaction (e.g. an inline image). If interaction is omitted this implies its value is false.

module is the name of the module that detected the link.

page is the 1-based index of the page on which the link occurs, if that is a meaningful concept for the format of the source file (e.g. for PDF it is, for HTML it is not).

redirect is present if this link was a redirect. Possible values are "temporary" for a temporary redirect such as 302, "permanent" for a permanant redirect such as 301, or "unknown" for some other type of redirect.

rel has a meaning similar to the HTML rel attribute on the a element.

selector is a CSS selector that identifies the element or elements on the page that created the link. coords is an array which provides the bounding box on the page for each of those elements.

tag is the element name that produced the link, if it was produced by an HTML or other XML-like element link. The value will always be in lower case.

text is the displayed clickable text of a hypertext link. accessibleName is the "accessible name" of the link, and accessibleDescription is the "accessible description". Neither of these is included if they are the empty string, and accessibleName is not included if is identical to text (ignoring differences in white-space).

Example:

<a href="http://example.com#main" title="Example">click here</a>

{
    "fragment": "main",
    "interaction": true,
    "selector": "a[href=http://example.com#main]",
    "tag": "a",
    "text": "click here"
}

Example:

<link rel=stylesheet href="main.css">

{
    "as": "style",
    "selector": "link[rel=stylesheet][href=main.css]",
    "tag": "link"
}

report

{
    "data": <object>,
    "pages": [ <url> ],
    "summary": <summary>,
    "urls": { <url>: <urlData> }
}

The full output of a report. summary is the report summary that can be fetched separately via the API above. pages is a list of the URLs of the HTML pages that were successfully fetched and tested, in the order that they were visited. Each page url can be looked up in the urls object. urls is a mapping from URLs to the results of testing those URLs. data is a mapping from module names to objects containing data recorded by that module. The data recorded depends on the module in question.

sourceInfo

{
    "firstByte": <integer>,
    "firstColumn": <integer>,
    "firstLine": <integer>,
    "lastByte": <integer>,
    "lastColumn": <integer>,
    "lastLine": <integer>
}

Identifies the section of a text-based file (e.g. HTML) relevant to a diagnostic.

summary

{
    "base": <url>,
    "config": <config>,
    "engine": {
      "browser": <string>,
      "name": "dexter",
      "version": <string>
    },
    "finish": <dateTime>,
    "limits": [ <string> ],
    "pages": <integer>,
    "pageTypes": { <string>: <integer> },
    "requestedPages": <integer>,
    "start": <dateTime>,
    "urls": <integer>
}

The summary data about a report.

base is the base URL of the report.

start and finish are the start and finish times of the test.

pages is the number of pages that were successfully fetched and tested; urls is the number of URLs that were fetched (or attempted to be fetched). pageTypes indicates how many of each type of page (e.g. "html", "pdf") were tested.

urlData

{
    "contentTested": <boolean>,
    "data": <object>,
    "description": <string>,
    "diagnostics": [ <diagnostic> ],
    "finish": <dateTime>,
    "initialTime": <integer>,
    "links": { <url>: [ <linkData> ] },
    "location": <url>,
    "mimeParams": { <string>: <string> },
    "mimeType": <string>,
    "ok": <boolean>,
    "page": <boolean>
    "pageLoadTime": <integer>,
    "screenshot": {
      "additional": {
        <string>: {
          "height": <integer>,
          "storage": <url>,
          "width": <integer>
        }
      },
      "height": <integer>,
      "storage": <url>,
      "width": <integer>
    },
    "sha3Hash": <string>,
    "size": <integer>,
    "start": <dateTime>,
    "storage": <url>,
    "title": <string>,
    "totalTime": <integer>,
    "transportSize": <integer>,
}

The results of testing a URL. All properties are optional except start and finish, which are the start and finish times of the testing of the URL.

If contentTested is true then the contents of the resource itself were checked; otherwise the link checker merely checked the URL was not returning an error.

data is a mapping from module names to objects containing data recorded by that module. The data recorded depends on the module in question.

description is the metadata "description" of the resource, if appropriate and present - e.g. the <meta name=description> tag in HTML.

diagnostics is a list of diagnostics found relating directly to this URL.

initialTime is the time-to-first-byte of the URL fetch, totalTime is the time-to-last-byte. pageLoadTime is the time to load the URL as a page in the browser (from the start of the page fetch until the 'load' event is fired). All these values are integers in milliseconds.

links is a mapping indicating URLs that were found linked from this resource to a list of linkData objects. The URLs have their fragments stripped; fragment information is included in the linkData object. The value is a list because there may be multiple links (potentially of different types) from a resource to the same URL.

location is only present if the URL is a redirect, and indicates the final target of the redirection - i.e. if URL A redirects to URL B which redirects to URL C, then the location for URL A will be URL C. The immediate target of the redirection (i.e. URL B) will be available from the links object.

mimeType is the MIME type of the resource, if provided by the server - e.g. "text/html". It is always in lower case. mimeParams contains other parameters from the MIME Content-Type. Not all parameters are guaranteed to be stored, but charset will be stored if it was provided.

ok is true if the resource returned a 'success' HTTP response (2xx or 3xx), or false if another HTTP status was received.

If page is true then the URL counted towards the total of tested pages, and will be listed in the pages property of the top-level report object.

screenshot, if present, specifies a URL at which a width x height pixel screenshot of the page may be downloaded. storage, if present, specifies a URL at which the original file contents can be downloaded. additional, if present, can contain zero or more additional variant screenshots, e.g. of the page viewed while emulating a mobile device.

sha3Hash, if present, is the SHA3-256 hash of the file contents as a hexadecimal string.

size is the size of the resource in bytes. transportSize is the size of the resource as transmitted over the network (e.g. if the resource was sent using gzip encoding this may be smaller than size). If transportSize is missing it is the same as size.

title is the title of the resource, if appropriate - e.g. the contents of the <title> tag in HTML.

Configuration

Syntax

Configuration = ConfigItem ( ( ";" | "\n" ) ConfigItem )*
ConfigItem = Assignment | Condition "{" Configuration "}" | <empty>
Condition = URLPattern | RegExp
Assignment = Name | "!" Name | Name ( "=" | "+=" | "-=" ) Value
Name = ( Alpha | "_" ) ( Alpha | Digit | "_" )* ("." Name)*
Value = List | SingleValue
List = "[" SingleValue ( "," SingleValue )* "]"
SingleValue = Boolean | String | RegExp | Duration | Size | Integer

Whitespace (spaces and tabs) is allowed between syntactic tokens, except inside a Name. Wherever whitespace is allowed, a comment block may be inserted with /* ... */, or the rest of a line can be commented out with //.

Configuration can be global, for all URLs in the test, or can be made conditional and only active for certain URLs. The conditions can be based upon URLPatterns or RegExps. For example:

!caseSensitive // treat all URLs as not case sensitive
http://example.com/private/* {
    /* URLs under http://example.com/private
       are case sensitive */
    caseSensitive
}
/\.png$/ {
    !include // completely exclude URLs ending in ".png" from the test
}

Conditional blocks can be nested, and rules inside the nested blocks will only apply to URLs that match all the nested conditions. For example:

http://example.org/* {
    /\.png$/ {
        /* exclude URLs ending in .png, but only if they are under
           http://example.org/ */
        !include
    }
}

Boolean

Boolean values can be assigned either true or false. They can also be set to true by just specifying the item name, or false by specifying an ! followed by the item name.

Examples:

base.startpoint = false
base.startpoint  // set base.startpoint to true
!base.startpoint // set base.startpoint to false

String

Strings start and end with a ', and can contain backslash escapes:

\\: a literal \ \n: a newline \': a literal ' \xXX: the character with code point XX (hexadecimal)

Examples:

'hello'
'i like \' apostrophes'
'i don\x27t like \\ backslashes'

RegExp

Regular expressions are the same as JavaScript regular expression literals. The only valid flag is i for case insensitivity.

Example:

/[a-z]+/i

Duration

A duration in seconds. The value consists of one or more integers, each followed by a unit specifier - ms for milliseconds, s for seconds, m for minutes, h for hours, d for days and w for weeks.

Examples:

500ms   // 500 milliseconds, i.e. 0.5 seconds
10m     // 10 minutes, i.e. 600 seconds
1d8h    // 1 day 8 hours, i.e. 115,200 seconds

Size

A size in bytes. The value consists of an integer, followed by an optional unit specifier (k for kilobytes, M for megabytes, or G for gigabytes), followed by a B.

Examples:

7B      // 7 bytes
12kB    // 12 kilobytes, i.e. 12,288 bytes
1MB     // 1 megabyte, i.e. 1,048,576 bytes

Integer

A positive (or zero) integer. Leading zeroes are permitted.

URLPattern

Scheme '://' Host ( ':' Port )? '/' Path?

The scheme can be either http, https, or * to indicate either.

The host can be a simple string, to match that hostname, or *. followed by a simple string, to indicate that hostname and subdomains thereof, or . to match the hostname of the base URL of the test, or * to match any host.

The port can be omitted, to match the default port for the URL's scheme, or an integer value to match that specific port, or * to match any port.

The path is a string to match the URL's whole path. It can contain * characters as a wildcard.

Examples:

// matches any URL:
*://*:*/*
// matches any https: URL on port 443 at the base hostname of the test:
https://./*
// matches any URL with hostname 'example.com' and the default port:
*://example.com/*
// matches any URL with hostname 'example.com' or a subdomain thereof,
// e.g. 'www.example.com', and the default port:
*://*.example.com/*
// matches the exact URL http://example.com/foo/bar:
http://example.com/foo/bar
// matches any URL with a path beginning 'foo' and ending 'bar':
*://*:*/foo*bar