Chunked transfer encoding, since originally introduced in HTTP/1.1
in RFC 2068, is specified to use CRLF as the only line terminator.
Although tolerant applications may recognize a single LF, formally
this covers the start line and fields, and doesn't apply to chunks.
Strict chunked parsing is reaffirmed as intentional in RFC errata
ID 7633, notably "because it does not have to retain backwards
compatibility with 1.0 parsers".
A general RFC 2616 recommendation to tolerate deviations whenever
interpreted unambiguously doesn't apply here, because chunked body
is used to determine HTTP message framing; a relaxed parsing may
cause various security problems due to a broken delimitation.
For instance, this is possible when receiving chunked body from
intermediates that blindly parse chunk-ext or a trailer section
until CRLF, and pass it further without re-coding.
This allows to process a port subcomponent and save it in r->port
in a unified way, similar to r->headers_in.server. For HTTP/1.x
request line in the absolute form, r->host_end now includes a port
subcomponent, which is also consistent with HTTP/2 and HTTP/3.
The change allows modules to use the CONNECT method with HTTP/1.1 requests.
To do so, they need to set the "allow_connect" flag in the core server
configuration.
The $request_port variable contains the port passed by the client in the
request line (for HTTP/1.x) or ":authority" pseudo-header (for HTTP/2 and
HTTP/3). If the request line contains no host, or ":authority" is missing,
then $request_port is taken from the "Host" header, similar to the $host
variable.
The $is_request_port variable contains ":" if $request_port is non-empty,
and is empty otherwise.
Neither r->port_start nor r->port_end were ever used.
The r->port_end is set by the parser, though it was never used by
the following code (and was never usable, since not copied by the
ngx_http_alloc_large_header_buffer() without r->port_start set).
Multi headers are now using linked lists instead of arrays. Notably,
the following fields were changed: r->headers_in.cookies (renamed
to r->headers_in.cookie), r->headers_in.x_forwarded_for,
r->headers_out.cache_control, r->headers_out.link, u->headers_in.cache_control
u->headers_in.cookies (renamed to u->headers_in.set_cookie).
The r->headers_in.cookies and u->headers_in.cookies fields were renamed
to r->headers_in.cookie and u->headers_in.set_cookie to match header names.
The ngx_http_parse_multi_header_lines() and ngx_http_parse_set_cookie_lines()
functions were changed accordingly.
With this change, multi headers are now essentially equivalent to normal
headers, and following changes will further make them equivalent.
In 71edd9192f24 logging of invalid headers which were rejected with the
NGX_HTTP_PARSE_INVALID_HEADER error was restricted to just the "client
sent invalid header line" message, without any attempts to log the header
itself.
This patch returns logging of the header up to the invalid character and
the character itself. The r->header_end pointer is now properly set
in all cases to make logging possible.
The same logging is also introduced when parsing headers from upstream
servers.
Control characters (0x00-0x1f, 0x7f), space, and colon were never allowed in
header names. The only somewhat valid use is header continuation which nginx
never supported and which is explicitly obsolete by RFC 7230.
Previously, such headers were considered invalid and were ignored by default
(as per ignore_invalid_headers directive). With this change, such headers
are unconditionally rejected.
It is expected to make nginx more resilient to various attacks, in particular,
with ignore_invalid_headers switched off (which is inherently unsecure, though
nevertheless sometimes used in the wild).
Control characters (0x00-0x1f, 0x7f) were never allowed in URIs, and must
be percent-encoded by clients. Further, these are not believed to appear
in practice. On the other hand, passing such characters might make various
attacks possible or easier, despite the fact that currently allowed control
characters are not significant for HTTP request parsing.
From now on, requests with spaces in URIs are immediately rejected rather
than allowed. Spaces were allowed in 31e9677b15a1 (0.8.41) to handle bad
clients. It is believed that now this behaviour causes more harm than
good.
No valid CONNECT requests are expected to appear within nginx, since it
is not a forward proxy. Further, request line parsing will reject
proper CONNECT requests anyway, since we don't allow authority-form of
request-target. On the other hand, RFC 7230 specifies separate message
length rules for CONNECT which we don't support, so make sure to always
reject CONNECTs to avoid potential abuse.
When the request line contains request-target in the absolute-URI form,
it can contain path-empty instead of a single slash (see RFC 7230, RFC 3986).
Previously, the ngx_http_parse_request_line() function only accepted empty
path when there was no query string.
With this change, non-empty query is also correctly handled. That is,
request line "GET http://example.com?foo HTTP/1.1" is accepted and results
in $uri "/" and $args "foo".
Note that $request_uri remains "?foo", similarly to how spaces in URIs
are handled. Providing "/?foo", similarly to how "/" is provided for
"GET http://example.com HTTP/1.1", requires allocation.
As defined in HTTP/1.1, body chunks have the following ABNF:
chunk = chunk-size [ chunk-ext ] CRLF chunk-data CRLF
where chunk-data is a sequence of chunk-size octets.
With this change, chunk-data that doesn't end up with CRLF at chunk-size
offset will be treated as invalid, such as in the example provided below:
4
SEE-THIS-AND-
4
THAT
0
It is used at least by SOAP (M-POST method, defined by RFC 2774) and
by WebDAV versioning (VERSION-CONTROL and BASELINE-CONTROL methods,
defined by RFC 3253).
Both minor and major versions are now limited to 999 maximum. In case of
r->http_minor, this limit is already implied by the code. Major version,
r->http_major, in theory can be up to 65535 with current code, but such
values are very unlikely to become real (and, additionally, such values
are not allowed by RFC 7230), so the same test was used for r->http_major.
Minimal data length we expect for further calls was calculated incorrectly
if parsing stopped right after parsing chunk size. This might in theory
affect clients and/or backends using LF instead of CRLF.
Patch by Dmitry Popov.
Windows treats "/directory./" identical to "/directory/". Do the same
when working on Windows. Note that the behaviour is different from one
with last path component (where multiple spaces and dots are ignored by
Windows).
Additional parsing logic added to correctly handle RFC 3986 compliant IPv6 and
IPvFuture characters enclosed in square brackets.
The host validation was completely rewritten. The behavior for non IP literals
was changed in a more proper and safer way:
- Host part is now delimited either by the first colon or by the end of string
if there's no colon. Previously the last colon was used as delimiter which
allowed substitution of a port number in the $host variable.
(e.g. Host: 127.0.0.1:9000:80)
- Fixed stripping of the ending dot in the Host header when the host was also
followed by a port number.
(e.g. Host: nginx.com.:80)
- Fixed upper case characters detection. Previously it was broken which led to
wasting memory and CPU.