A url parser for http requests, compliant with RFC 7230

Downloads in past


1801.0.25 years ago6 years agoMinified + gzip package size for request-target in KB


request-target Build Status

Another URL parser?

The core url module is great for parsing generic URLs. Unfortunately, the URL of an HTTP request (formally called the request-target), is not just a generic URL. It's a URL that must obey the requirements of the URL RFC 3986 as well as the HTTP RFC 7230.

The problems

The core http module does not validate or sanitize req.url.
The legacy url.parse() function also allows illegal characters to appear.
The newer url.URL() constructor will attempt to convert the input into a properly encoded URL with only legal characters. This is better for the general case, however, the official http spec states:
A recipient SHOULD NOT attempt to autocorrect and then process the request without a redirect, since the invalid request-line might be deliberately crafted to bypass security filters along the request chain.

This means a malformed URL should be treated as a violation of the http protocol. It's not something that should be accepted or autocorrected, and it's not something that higher-level code should ever have to worry about.

Adoption into core

Because of backwards compatibility, it's unlikely that the logic expressed in request-target will be incorporated into the core http module. My recommendation is to incorporate it as an alternative function in the core url module. If that never happens, just make sure you're using this package when parsing req.url.

How to use

The function takes a request object as input (not a URL string) because the http spec requires inspection of req.method and in order to properly interpret the URL of a request. If the function returns null, the request should not be processed further—either destroy the connection or respond with Bad Request.
If the request is valid, it will return an object with five properties: protocol, hostname, port, pathname, and search. The first three properties are either non-empty strings or null, and are mutually dependant. The pathname property is always a non-empty string, and the search property is always a possibly empty string.
If the first three properties are not null, it means the request was in absolute-form or a valid non-empty Host header was provided.
const result = parse(req);
if (result) {
  // { protocol, hostname, port, pathname, search }
} else {

Unexpected benefits

The goal of request-target was not to create a fast parser, but it turns out this implementation can be between 1.5–9x faster than the general-purpose parsers in core.
$ npm run benchmark
legacy url.parse() x 371,681 ops/sec ±0.88% (297996 samples)
whatwg new URL() x 58,766 ops/sec ±0.3% (118234 samples)
request-target x 552,748 ops/sec ±0.54% (344809 samples)

Run the benchmark yourself with npm run benchmark.