When using OpenSSL 3.5, the crypto_release_rcd QUIC callback can be
called late, after the QUIC connection was already closed on handshake
failure, resulting in a segmentation fault. For instance, it happened
if a client Finished message didn't align with a record boundary.
This commit is prepared for HTTP/2 and HTTP/3 support.
The ALPN protocol is now set per-connection in
ngx_http_upstream_ssl_init_connection(), allowing proper protocol negotiation
for each individual upstream connection regardless of SSL context sharing.
Chunked transfer encoding, since originally introduced in HTTP/1.1
in RFC 2068, is specified to use CRLF as the only line terminator.
Although tolerant applications may recognize a single LF, formally
this covers the start line and fields, and doesn't apply to chunks.
Strict chunked parsing is reaffirmed as intentional in RFC errata
ID 7633, notably "because it does not have to retain backwards
compatibility with 1.0 parsers".
A general RFC 2616 recommendation to tolerate deviations whenever
interpreted unambiguously doesn't apply here, because chunked body
is used to determine HTTP message framing; a relaxed parsing may
cause various security problems due to a broken delimitation.
For instance, this is possible when receiving chunked body from
intermediates that blindly parse chunk-ext or a trailer section
until CRLF, and pass it further without re-coding.
- Issue templates are replaced with forms. Forms allow to explicitly ask
for certain info before an issue is opened, they can be programmatically
queried via GH actions to get the data in fields.
- Added language around GH discussions vs the forum in the issue forms.
- Added GH discussions templates. These templates delineate which types
of discussions belong on GitHub vs the community forum.
- Created SUPPORT.md to delineate which types of topics belong on GitHub
vs different support channels (community forum/docs/commercial support).
- Updated CONTRIBUTING.md:
- Removed text that belongs in SUPPORT.md.
- Added F5 CLA clarifying text.
- Added badges to README.md. Most of these are there to make information
even clearer, moreso for users reading README.md from sources outside
GitHub.
If request URI was shorter than location prefix, as after replacement
with try_files, location length was used to copy the remaining URI part
leading to buffer overread.
The fix is to replace full request URI in this case. In the following
configuration, request "/123" is changed to "/" when sent to backend.
location /1234 {
try_files /123 =404;
proxy_pass http://127.0.0.1:8080/;
}
Closes#983 on GitHub.
This allows to process a port subcomponent and save it in r->port
in a unified way, similar to r->headers_in.server. For HTTP/1.x
request line in the absolute form, r->host_end now includes a port
subcomponent, which is also consistent with HTTP/2 and HTTP/3.
Validation is rewritten to follow RFC 3986 host syntax, based on
ngx_http_parse_request_line(). The following is now rejected:
- the rest of gen-delims "#", "?", "@", "[", "]"
- other unwise delims <">, "<", ">", "\", "^", "`', "{", "|", "}"
- IP literals with a trailing dot, missing closing bracket, or pct-encoded
- a port subcomponent with invalid values
- characters in upper half
In addition to moving memcpy() under the length condition in 15bf6d8cc,
which addressed a reported UB due to string function conventions, this
is repeated for advancing an input buffer, to make the resulting code
more clean and readable.
Additionally, although considered harmless for both string functions and
additive operators, as previously discussed in GitHub PR 866, this fixes
the main source of annoying sanitizer reports in the module.
Prodded by UndefinedBehaviorSanitizer (pointer-overflow).
The function interface is changed to follow a common approach
to other functions used to setup SSL_CTX, with an exception of
"ngx_conf_t *cf" since it is not bound to nginx configuration.
This is required to report and propagate SSL_CTX_set_ex_data()
errors, as reminded by Coverity (CID 1668589).
For certain compilers we embed the compiler version used to build nginx
in the binary, retrievable via 'nginx -V', e.g.
$ ./objs/nginx -V
...
built by gcc 15.2.1 20250808 (Red Hat 15.2.1-1) (GCC)
...
However if the CFLAGS environment variable is set this would be omitted.
This is due to the compiler specific auto/cc files not being run when
the CFLAGS environment variable is set, this is so entities can set
their own compiler flags, and thus the NGX_COMPILER variable isn't set.
Nonetheless it is a useful thing to have so re-work the auto scripts to
move the version gathering out of the individual auto/cc/$NGX_CC_NAME
files and merge them into auto/cc/name.
Link: <https://github.com/nginx/nginx/issues/878>
The new directives add_header_inherit and add_trailer_inherit allow
to alter inheritance rules for the values specified in the add_header
and add_trailer directives in a convenient way.
The "merge" parameter enables appending the values from the previous level
to the current level values.
The "off" parameter cancels inheritance of the values from the previous
configuration level, similar to add_header "" (2194e75bb).
The "on" parameter (default) enables the standard inheritance behaviour,
which is to inherit values from the previous level only if there are no
directives on the current level.
The inheritance rules themselves are inherited in a standard way. Thus,
for example, "add_header_inherit merge;" specified at the top level will
be inherited in all nested levels recursively unless redefined below.
Variables contain the IANA name of the signature scheme[1] used to sign
the TLS handshake.
Variables are only meaningful when using OpenSSL 3.5 and above, with older
versions they are empty. Moreover, since this data isn't stored in a
serialized session, variables are only available for new sessions.
[1] https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml
Requested by willmafh.
After f10bc5a763 the address was set to NULL only when local address was
not specified at all. In case complex value evaluated to an empty or
invalid string, local address remained unchanged. Currenrly this is not
a problem since the value is only set once. This change is a preparation
for being able to change the local address after initial setting.
The change allows modules to use the CONNECT method with HTTP/1.1 requests.
To do so, they need to set the "allow_connect" flag in the core server
configuration.
The $request_port variable contains the port passed by the client in the
request line (for HTTP/1.x) or ":authority" pseudo-header (for HTTP/2 and
HTTP/3). If the request line contains no host, or ":authority" is missing,
then $request_port is taken from the "Host" header, similar to the $host
variable.
The $is_request_port variable contains ":" if $request_port is non-empty,
and is empty otherwise.
BoringSSL/AWS-LC provide two callbacks for each compression algorithm,
which may be used to compress and decompress certificates in runtime.
This change implements compression support with zlib, as enabled with
the ssl_certificate_compression directive. Compressed certificates
are stored in certificate exdata and reused in subsequent connections.
Notably, AWS-LC saves an X509 pointer in SSL connection, which allows
to use it from SSL_get_certificate() for caching purpose. In contrast,
BoringSSL reconstructs X509 on-the-fly, though given that it doesn't
support multiple certificates, always replacing previously configured
certificates, we use the last configured one from ssl->certs, instead.
In rare cases, it was possible to get into this error state on reload
with improperly updated file timestamps for certificate and key pairs.
The fix is to retry on X509_R_KEY_VALUES_MISMATCH, similar to 5d5d9adcc.
Additionally, loading SSL certificate is updated to avoid certificates
discarded on retry to appear in ssl->certs and in extra chain.
The change introduces an SNI based virtual server selection during
early ClientHello processing. The callback is available since
OpenSSL 1.1.1; for older OpenSSL versions, the previous behaviour
is kept.
Using the ClientHello callback sets a reasonable processing order
for the "server_name" TLS extension. Notably, session resumption
decision now happens after applying server configuration chosen by
SNI, useful with enabled verification of client certificates, which
brings consistency with BoringSSL behaviour. The change supersedes
and reverts a fix made in 46b9f5d38 for TLSv1.3 resumed sessions.
In addition, since the callback is invoked prior to the protocol
version negotiation, this makes it possible to set "ssl_protocols"
on a per-virtual server basis.
To keep the $ssl_server_name variable working with TLSv1.2 resumed
sessions, as previously fixed in fd97b2a80, a limited server name
callback is preserved in order to acknowledge the extension.
Note that to allow third-party modules to properly chain the call to
ngx_ssl_client_hello_callback(), the servername callback function is
passed through exdata.
This was broken in 7468a10b6 (1.29.0), resulting in a missing diagnostics
and SSL error queue not cleared for SSL handshakes rejected by SNI, seen
as "ignoring stale global SSL error" alerts, for instance, when doing SSL
shutdown of a long standing connection after rejecting another one by SNI.
The fix is to move the qc->error check after c->ssl->handshake_rejected is
handled first, to make the error queue cleared. Although not practicably
visible as needed, this is accompanied by clearing the error queue under
the qc->error case as well, to be on the safe side.
As an implementation note, due to the way of handling invalid transport
parameters for OpenSSL 3.5 and above, which leaves a passed pointer not
advanced on error, SSL_get_error() may return either SSL_ERROR_WANT_READ
or SSL_ERROR_WANT_WRITE depending on a library. To cope with that, both
qc->error and c->ssl->handshake_rejected checks were moved out of
"sslerr != SSL_ERROR_WANT_READ".
Also, this reconstructs a missing "SSL_do_handshake() failed" diagnostics
for the qc->error case, replacing using ngx_ssl_connection_error() with
ngx_connection_error(). It is made this way to avoid logging at the crit
log level because qc->error set is expected to have an empty error queue.
Reported and tested by Vladimir Homutov.
The example configuration previously specified 'charset koi8-r',
which is a legacy Cyrillic encoding. As koi8-r is rarely used today
and modern browsers handle UTF-8 by default, specifying the charset
explicitly is unnecessary. Removing the directive keeps the example
configuration concise and aligned with current best practices.
They might be reused in a session if an SMTP client proceeded
unauthenticated after previous invalid authentication attempts.
This could confuse an authentication server when passing stale
credentials along with "Auth-Method: none".
The condition to send the "Auth-Salt" header is similarly refined.
The ssl_certificate_compression directive allows to send compressed
server certificates. In OpenSSL, they are pre-compressed on startup.
To simplify configuration, the SSL_OP_NO_TX_CERTIFICATE_COMPRESSION
option is automatically cleared if certificates were pre-compressed.
SSL_CTX_compress_certs() may return an error in legitimate cases,
e.g., when none of compression algorithms is available or if the
resulting compressed size is larger than the original one, thus it
is silently ignored.
Certificate compression is supported in Chrome with brotli only,
in Safari with zlib only, and in Firefox with all listed algorithms.
It is supported since Ubuntu 24.10, which has OpenSSL with enabled
zlib and zstd support.
The actual list of algorithms supported in OpenSSL depends on how
the library was configured; it can be brotli, zlib, zstd as listed
in RFC 8879.
Certificate compression is supported since OpenSSL 3.2, it is enabled
automatically as negotiated in a TLSv1.3 handshake.
Using certificate compression and decompression in runtime may be
suboptimal in terms of CPU and memory consumption in certain typical
scenarios, hence it is disabled by default on both server and client
sides. It can be enabled with ssl_conf_command and similar directives
in upstream as appropriate, for example:
ssl_conf_command Options RxCertificateCompression;
ssl_conf_command Options TxCertificateCompression;
Compressing server certificates requires additional support, this is
addressed separately.
The function contains mostly HTTP/1.x specific request processing,
which has no use in other protocols. After the previous change in
HTTP/2, it can now be hidden.
This is an API change.
Previously, it misused the Host header processing resulting in
400 (Bad Request) errors for a valid request that contains both
":authority" and Host headers with the same value, treating it
after 37984f0be as if client sent more than one Host header.
Such an overly strict handling violates RFC 9113.
The fix is to process ":authority" as a distinct header, similarly
to processing an authority component in the HTTP/1.x request line.
This allows to disambiguate and compare Host and ":authority"
values after all headers were processed.
With this change, the ngx_http_process_request_header() function
can no longer be used here, certain parts were inlined similar to
the HTTP/3 module.
To provide compatibility for misconfigurations that use $http_host
to return the value of the ":authority" header, the Host header,
if missing, is now reconstructed from ":authority".
Previously, when using HTTP/2 over SSL, an early hints HEADERS frame was
queued in SSL buffer, and might not be immediately flushed. This resulted
in a delay of early hints delivery until the main response was sent.
The fix is to set the flush flag for the early hints HEADERS frame buffer.
RFC 9114, Section 4.3.1. specifies a restriction for :authority and Host
coexistence in an HTTP/3 request:
: If both fields are present, they MUST contain the same value.
Previously, this restriction was correctly enforced only for portless
values. When Host contained a port, the request failed as if :authority
and Host were different, regardless of :authority presence.
This happens because the value of r->headers_in.server used for :authority
has port stripped. The fix is to use r->host_start / r->host_end instead.
This might happen for Huffman encoded string literals as the result
of length expansion. Notably, the maximum length of string literals
is already limited with the "large_client_header_buffers" directive,
so this was only possible with nonsensically large configured limits.
The kevent udata field was changed from intptr_t to "void *",
similar to other BSDs and Darwin.
The NGX_KQUEUE_UDATA_T macro is adjusted to reflect that change,
fixing -Werror=int-conversion errors.
The NGX_KQUEUE_UDATA_T macro is used to compensate the incompatible
kqueue() API in NetBSD, it doesn't really belong to feature tests.
The change limits the macro visibility to the kqueue event module.
Moving from autotests also simplifies testing a particular NetBSD
version as seen in a subsequent change.
The kevent udata field is special in that we maintain compatibility
with NetBSD versions that predate using the "void *" type.
The fix is to cast to intermediate uintptr_t that is casted back to
"void *" where appropriate.
Prior to OpenSSL 3.0, OPENSSL_VERSION_NUMBER used the following format:
MNNFFPPS: major minor fix patch status
Where the status nibble (S) has 0+ for development and f for release.
The format was changed in OpenSSL 3.0.0, where it is always zero:
MNN00PP0: major minor patch
Disabling the build dependency feature is safe assuming that
nginx/Windows release zip is always built from a clean tree.
This allows to speed up total build time by around 40%.
As it may not be suitable in general, the option resides here
and not in configure.
In OpenSSL 3.5.0, the "quic_transport_parameters" extension set
internally by the QUIC API is cleared on the SSL context switch,
which disables sending QUIC transport parameters if switching to
a different server block on SNI. See the initial report in [1].
This is fixed post OpenSSL 3.5.0 [2]. The fix is anticipated in
OpenSSL 3.5.1, which has not been released yet. When building
with OpenSSL 3.5, OpenSSL compat layer is now used by default.
The OpenSSL 3.5 QUIC API support can be switched back using
--with-cc-opt='-DNGX_QUIC_OPENSSL_API=1'.
[1] https://github.com/nginx/nginx/issues/711
[2] https://github.com/openssl/openssl/commit/45bd3c3798
The gRPC module context has connection specific state, which can be lost
after request reinitialization when it comes to processing early hints.
The fix is to do only a portion of u->reinit_request() implementation
required after processing early hints, now inlined in modules.
Now NGX_HTTP_UPSTREAM_EARLY_HINTS is returned from u->process_header()
for early hints. When reading a cached response, this code is mapped
to NGX_HTTP_UPSTREAM_INVALID_HEADER to indicate invalid header format.
There were a few random places where 0 was being used as a null pointer
constant.
We have a NULL macro for this very purpose, use it.
There is also some interest in actually deprecating the use of 0 as a
null pointer constant in C.
This was found with -Wzero-as-null-pointer-constant which was enabled
for C in GCC 15 (not enabled with Wall or Wextra... yet).
Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059>
The functions ngx_http_merge_types() & ngx_conf_merge_path_value()
return either NGX_CONF_OK aka NULL aka ((void *)0) (probably) or
NGX_CONF_ERROR aka ((void *)-1).
They don't return an integer constant which is what NGX_OK aka (0) is.
Lets use the right thing in the function return check.
This was found with -Wzero-as-null-pointer-constant which was enabled
for C in GCC 15 (not enabled with Wall or Wextra... yet).
Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059>
The change implements processing upstream early hints response in
ngx_http_proxy_module and ngx_http_grpc_module. A new directive
"early_hints" enables sending early hints to the client. By default,
sending early hints is disabled.
Example:
map $http_sec_fetch_mode $early_hints {
navigate $http2$http3;
}
early_hints $early_hints;
proxy_pass http://example.com;
The support first appeared in OS X Mavericks 10.9 and documented since
OS X Yosemite 10.10.
It has a subtle implementation difference from other operating systems
in that the TCP_KEEPALIVE socket option (used in place of TCP_KEEPIDLE)
isn't inherited from a listening socket to an accepted socket.
An apparent reason for this behaviour is that it might be preserved for
the sake of backward compatibility. The TCP_KEEPALIVE socket option is
not inherited since appearance in OS X Panther 10.3, which long predates
two other TCP_KEEPINTVL and TCP_KEEPCNT socket options.
Thanks to Andy Pan for initial work.
Certain providers may attempt to reload the key on the first use after a
fork. Such attempt would require re-prompting the pin, and this time we
are not able to pass the password callback.
While it is addressable with configuration for a specific provider, it would
be prudent to ensure that no such prompts could block worker processes by
setting the default UI method.
UI_null() first appeared in 1.1.1 along with the OSSL_STORE, so it is safe
to assume the same set of guards.
A new "store:..." prefix for the "ssl_certificate_key" directive allows
loading keys via the OSSL_STORE API.
The change is required to support hardware backed keys in OpenSSL 3.x using
the new "provider(7ossl)" modules, such as "pkcs11-provider". While the
engine API is present in 3.x, some operating systems (notably, RHEL10)
have already disabled it in their builds of OpenSSL.
Related: https://trac.nginx.org/nginx/ticket/2449
Similarly to the QUIC API originated in BoringSSL, this API allows
to register custom TLS callbacks for an external QUIC implementation.
See the SSL_set_quic_tls_cbs manual page for details.
Due to a different approach used in OpenSSL 3.5, handling of CRYPTO
frames was streamlined to always write an incoming CRYPTO buffer to
the crypto context. Using SSL_provide_quic_data(), this results in
transient allocation of chain links and buffers for CRYPTO frames
received in order. Testing didn't reveal performance degradation of
QUIC handshakes, https://github.com/nginx/nginx/pull/646 provides
specific results.
Using SSL_in_init() to inspect a handshake state was replaced with
SSL_is_init_finished(). This represents a more complete fix to the
BoringSSL issue addressed in 22671b37e.
This provides awareness of the early data handshake state when using
OpenSSL 3.5 TLS callbacks in 0-RTT enabled configurations, which, in
particular, is used to avoid premature completion of the initial TLS
handshake, before required client handshake messages are received.
This is a non-functional change when using BoringSSL. It supersedes
testing non-positive SSL_do_handshake() results in all supported SSL
libraries, hence simplified.
In preparation for using OpenSSL 3.5 TLS callbacks.
Encryption level values are decoupled from ssl_encryption_level_t,
which is now limited to BoringSSL QUIC callbacks, with mappings
provided. Although the values match, this provides a technically
safe approach, in particular, to access protection level sized arrays.
In preparation for using OpenSSL 3.5 TLS callbacks.
It is now called from ngx_quic_handle_crypto_frame(), prior to proceeding
with the handshake. With this logic removed, the handshake function is
renamed to ngx_quic_handshake() to better match ngx_ssl_handshake().
All definitions now set in ngx_event_quic.h, this includes moving
NGX_QUIC_OPENSSL_COMPAT from autotests to compile time. Further,
to improve code readability, a new NGX_QUIC_QUICTLS_API macro is
used for QuicTLS that provides old BoringSSL QUIC API.
Previously, they might be logged on every add_handshake_data
callback invocation when using OpenSSL compat layer and processing
coalesced handshake messages.
Further, the ALPN error message is adjusted to signal the missing
extension. Possible reasons were previously narrowed down with
ebb6f7d65 changes in the ALPN callback that is invoked earlier in
the handshake.
Following the previous change that removed posting a close event
in OpenSSL compat layer, now ngx_quic_close_connection() is always
called on error path with either NGX_ERROR or qc->error set.
This allows to remove a special value -1 served as a missing error,
which simplifies the code. Partially reverts d3fb12d77.
Also, this improves handling of the draining connection state, which
consists of posting a close event with NGX_OK and no qc->error set,
where it was previously converted to NGX_QUIC_ERR_INTERNAL_ERROR.
Notably, this is rather a cosmetic fix, because drained connections
do not send any packets including CONNECTION_CLOSE, and qc->error
is not otherwise used.
Changed handshake callbacks to always return success. This allows to avoid
logging SSL_do_handshake() errors with empty or cryptic "internal error"
OpenSSL error messages at the inappropriate "crit" log level.
Further, connections with failed callbacks are closed now right away when
using OpenSSL compat layer. This change supersedes and reverts c37fdcdd1,
with the conditions to check callbacks invocation kept to slightly improve
code readability of control flow; they are optimized out in the resulting
assembly code.
Logging level for such errors, which should not normally happen,
is changed to NGX_LOG_ALERT, and ngx_log_error() is replaced with
ngx_ssl_error() for consistency with the rest of the code.
Previously, it was not possible to send acknowledgments if the
congestion window was limited or temporarily exceeded, such as
after sending a large response or MTU probe. If ACKs were not
received from the peer for some reason to update the in-flight
bytes counter below the congestion window, this might result in
a stalled connection.
The fix is to send ACKs regardless of congestion control. This
meets RFC 9002, Section 7:
: Similar to TCP, packets containing only ACK frames do not count
: toward bytes in flight and are not congestion controlled.
This is a simplified implementation to send ACK frames from the
head of the queue. This was made possible after 6f5f17358.
Reported in trac ticket #2621 and subsequently by Vladimir Homutov:
https://mailman.nginx.org/pipermail/nginx-devel/2025-April/ZKBAWRJVQXSZ2ISG3YJAF3EWMDRDHCMO.html
This extends the target selection implemented in dad6ec3aa6 to support
Windows ARM64 platforms. OpenSSL support for VC-WIN64-ARM target first
appeared in 1.1.1 and is present in all currently supported (3.x)
branches.
As a side effect, ARM64 Windows builds will get 16-byte alignment along
with the rest of non-x86 platforms. This is safe, as malloc on 64-bit
Windows guarantees the fundamental alignment of allocations, 16 bytes.
Previously, the default pool alignment used sizeof(unsigned long), with
the expectation that this would match to a platform word size. Certain
64-bit platforms prove this assumption wrong by keeping the 32-bit long
type, which is fully compliant with the C standard.
This introduces a possibility of suboptimal misaligned access to the
data allocated with ngx_palloc() on the affected platforms, which is
addressed here by changing the default NGX_ALIGNMENT to a pointer size.
As we override the detection in auto/os/conf for all the machine types
except x86, and Unix-like 64-bit systems prefer the 64-bit long, the
impact of the change should be limited to Win64 x64.
After fixing ngx_http_v3_encode_varlen_int() in 400eb1b628,
NGX_HTTP_V3_VARLEN_INT_LEN retained the old value of 4, which is
insufficient for the values over 1073741823 (1G - 1).
The NGX_HTTP_V3_VARLEN_INT_LEN macro is used in ngx_http_v3_uni.c to
format stream and frame types. Old buffer size is enough for formatting
this data. Also, the macro is used in ngx_http_v3_filter_module.c to
format output chunks and trailers. Considering output_buffers and
proxy_buffer_size are below 1G in all realistic scenarios, the old buffer
size is enough here as well.
RFC 9002, Section 6.1.1 defines packet reordering threshold as 3. Testing
shows that such low value leads to spurious packet losses followed by
congestion window collapse. The change implements dynamic packet threshold
detection based on in-flight packet range. Packet threshold is defined
as half the number of in-flight packets, with mininum value of 3.
Also, renamed ngx_quic_lost_threshold() to ngx_quic_time_threshold()
for better compliance with RFC 9002 terms.
Previosly the threshold was hardcoded at 10000. This value is too low for
high BDP networks. For example, if all frames are STREAM frames, and MTU
is 1500, the upper limit for congestion window would be roughly 15M
(10000 * 1500). With 100ms RTT it's just a 1.2Gbps network (15M * 10 * 8).
In reality, the limit is even lower because of other frame types. Also,
the number of frames that could be used simultaneously depends on the total
amount of data buffered in all server streams, and client flow control.
The change sets frame threshold based on max concurrent streams and stream
buffer size, the product of which is the maximum number of in-flight stream
data in all server streams at any moment. The value is divided by 2000 to
account for a typical MTU 1500 and the fact that not all frames are STREAM
frames.
If connection is network-limited, MTU probes have little chance of being
sent since congestion window is almost always full. As a result, PMTUD
may not be able to reach the real MTU and the connection may operate with
a reduced MTU. The solution is to ignore the congestion window. This may
lead to a temporary increase in in-flight count beyond congestion window.
As per RFC 9000, Section 14.4:
Loss of a QUIC packet that is carried in a PMTU probe is therefore
not a reliable indication of congestion and SHOULD NOT trigger a
congestion control reaction.
Previously, these functions operated on a per-level basis. This however
resulted in excessive logging of in_flight and will also led to extra
work detecting underutilized congestion window in the followup patches.
On some systems the value of ngx_current_msec is derived from monotonic
clock, for which the following is defined by POSIX:
For this clock, the value returned by clock_gettime() represents
the amount of time (in seconds and nanoseconds) since an unspecified
point in the past.
As as result, overflow protection is needed when comparing two ngx_msec_t.
The change adds such protection to the ngx_quic_detect_lost() function.
Since recovery_start field was initialized with ngx_current_msec, all
congestion events that happened within the same millisecond or cycle
iteration, were treated as in recovery mode.
Also, when handling persistent congestion, initializing recovery_start
with ngx_current_msec resulted in treating all sent packets as in recovery
mode, which violates RFC 9002, see example in Appendix B.8.
While here, also fixed recovery_start wrap protection. Previously it used
2 * max_idle_timeout time frame for all sent frames, which is not a
reliable protection since max_idle_timeout is unrelated to congestion
control. Now recovery_start <= now condition is enforced. Note that
recovery_start wrap is highly unlikely and can only occur on a
32-bit system if there are no congestion events for 24 days.
Previously, the expiration caused QUIC connection finalization even if
there are application-terminated streams finishing sending data. Such
finalization terminated these streams.
An easy way to trigger this is to request a large file from HTTP/3 over
a small MTU. In this case keepalive timeout expiration may abruptly
terminate the request stream.
Improved logging for simpler data extraction for plotting congestion
window graphs. In particular, added current milliseconds number from
ngx_current_msec.
While here, simplified logging text and removed irrelevant data.
Starting with OpenSSL 3.0, groups may be added externally with pluggable
KEM providers. Using SSL_get_negotiated_group(), which makes lookup in a
static table with known groups, doesn't allow to list such groups by names
leaving them in hex. Adding X25519MLKEM768 to the default group list in
OpenSSL 3.5 made this problem more visible. SSL_get0_group_name() and,
apparently, SSL_group_to_name() allow to resolve such provider-implemented
groups, which is also "generally preferred" over SSL_get_negotiated_group()
as documented in OpenSSL git commit 93d4f6133f.
This change makes external groups listing by name using SSL_group_to_name()
available since OpenSSL 3.0. To preserve "prime256v1" naming for the group
0x0017, and to avoid breaking BoringSSL and older OpenSSL versions support,
it is used supplementary for a group that appears to be unknown.
See https://github.com/openssl/openssl/issues/27137 for related discussion.
Passwords were not preserved in optimized SSL contexts, the bug had
appeared in d791b4aab (1.23.1), as in the following configuration:
server {
proxy_ssl_password_file password;
proxy_ssl_certificate $ssl_server_name.crt;
proxy_ssl_certificate_key $ssl_server_name.key;
location /original/ {
proxy_pass https://u1/;
}
location /optimized/ {
proxy_pass https://u2/;
}
}
The fix is to always preserve passwords, by copying to the configuration
pool, if dynamic certificates are used. This is done as part of merging
"ssl_passwords" configuration.
To minimize the number of copies, a preserved version is then used for
inheritance. A notable exception is inheritance of preserved empty
passwords to the context with statically configured certificates:
server {
proxy_ssl_certificate $ssl_server_name.crt;
proxy_ssl_certificate_key $ssl_server_name.key;
location / {
proxy_pass ...;
proxy_ssl_certificate example.com.crt;
proxy_ssl_certificate_key example.com.key;
}
}
In this case, an unmodified version (NULL) of empty passwords is set,
to allow reading them from the password prompt on nginx startup.
As an additional optimization, a preserved instance of inherited
configured passwords is set to the previous level, to inherit it
to other contexts:
server {
proxy_ssl_password_file password;
location /1/ {
proxy_pass https://u1/;
proxy_ssl_certificate $ssl_server_name.crt;
proxy_ssl_certificate_key $ssl_server_name.key;
}
location /2/ {
proxy_pass https://u2/;
proxy_ssl_certificate $ssl_server_name.crt;
proxy_ssl_certificate_key $ssl_server_name.key;
}
}
It was possible to write outside of the buffer used to keep UTF-8
decoded values when parsing conversion table configuration.
Since this happened before UTF-8 decoding, the fix is to check in
advance if character codes are of more than 3-byte sequence. Note
that this is already enforced by a later check for ngx_utf8_decode()
decoded values for 0xffff, which corresponds to the maximum value
encoded as a valid 3-byte sequence, so the fix does not affect the
valid values.
Found with AddressSanitizer.
Fixes GitHub issue #529.
As uncovered by recent addition in slice.t, a partially initialized
context, coupled with HTTP 206 response from stub backend, might be
accessed in the next slice subrequest.
Found by bad memory allocator simulation.
Upstream SSL sessions may be of a noticeably larger size with tickets
in TLSv1.2 and older versions, or with "stateless" tickets in TLSv1.3,
if a client certificate is saved into the session. Further, certain
stateless session resumption implemetations may store additional data.
Such one is JDK, known to also include server certificates in session
ticket data, which roughly doubles a decoded session size to slightly
beyond the previous limit. While it's believed to be an issue on the
JDK side, this change allows to save such sessions.
Another, innocent case is using RSA certificates with 8192 key size.
Previously, request might be left in inconsistent state in case of error,
which manifested in "http request count is zero" alerts when used by SSI
filter.
The fix is to reshuffle initialization order to postpone committing state
changes until after any potentially failing parts.
Found by bad memory allocator simulation.
In OpenSSL, session resumption always happens in the default SSL context,
prior to invoking the SNI callback. Further, unlike in TLSv1.2 and older
protocols, SSL_get_servername() returns values received in the resumption
handshake, which may be different from the value in the initial handshake.
Notably, this makes the restriction added in b720f650b insufficient for
sessions resumed with different SNI server name.
Considering the example from b720f650b, previously, a client was able to
request example.org by presenting a certificate for example.org, then to
resume and request example.com.
The fix is to reject handshakes resumed with a different server name, if
verification of client certificates is enabled in a corresponding server
configuration.
The directive sets a timeout during which a keepalive connection will
not be closed by nginx for connection reuse or graceful shutdown.
The change allows clients that send multiple requests over the same
connection without delay or with a small delay between them, to avoid
receiving a TCP RST in response to one of them. This excludes network
issues and non-graceful shutdown. As a side-effect, it also addresses
the TCP reset problem described in RFC 9112, Section 9.6, when the last
sent HTTP response could be damaged by a followup TCP RST. It is important
for non-idempotent requests, which cannot be retried by client.
It is not recommended to set keepalive_min_timeout to large values as
this can introduce an additional delay during graceful shutdown and may
restrict nginx from effective connection reuse.
The build location of the resulting libatomic_ops.a was changed in v7.4.0
after converting libatomic_ops to use libtool. The fix is to use library
from the install path, this allows building with both old and new versions.
Initially reported here:
https://mailman.nginx.org/pipermail/nginx/2018-April/056054.html
This can happen with certificates and certificate keys specified
with variables due to partial cache update in various scenarios:
- cache expiration with only one element of pair evicted
- on-disk update with non-cacheable encrypted keys
- non-atomic on-disk update
The fix is to retry with fresh data on X509_R_KEY_VALUES_MISMATCH.
A new directive "ssl_certificate_cache max=N [valid=time] [inactive=time]"
enables caching of SSL certificate chain and secret key objects specified
by "ssl_certificate" and "ssl_certificate_key" directives with variables.
Co-authored-by: Aleksei Bavshin <a.bavshin@nginx.com>
SSL object cache, as previously introduced in 1.27.2, did not take
into account encrypted certificate keys that might be unexpectedly
fetched from the cache regardless of the matching passphrase. To
avoid this, caching of encrypted certificate keys is now disabled
based on the passphrase callback invocation.
A notable exception is encrypted certificate keys configured without
ssl_password_file. They are loaded once resulting in the passphrase
prompt on startup and reused in other contexts as applicable.
Memory based objects are always inherited, engine based objects are
never inherited to adhere the volatile nature of engines, file based
objects are inherited subject to modification time and file index.
The previous behaviour to bypass cache from the old configuration cycle
is preserved with a new directive "ssl_object_cache_inheritable off;".
It now uses 5/4 times more memory for the pending buffer.
Further, a single allocation is now used, which takes additional 56 bytes
for deflate_allocs in 64-bit mode aligned to 16, to store sub-allocation
pointers, and the total allocation size now padded up to 128 bytes, which
takes theoretically 200 additional bytes in total. This fits though into
"4 * (64 + sizeof(void*))" additional space for ZALLOC used in zlib-ng
2.1.x versions. The comment was updated to reflect this.
While trying to close a stream in ngx_quic_close_streams() by calling its
read event handler, the next stream saved prior to that could be destroyed
recursively. This caused a segfault while trying to access the next stream.
The way the next stream could be destroyed in HTTP/3 is the following.
A request stream read event handler ngx_http_request_handler() could
end up calling ngx_http_v3_send_cancel_stream() to report a cancelled
request stream in the decoder stream. If sending stream cancellation
decoder instruction fails for any reason, and the decoder stream is the
next in order after the request stream, the issue is triggered.
The fix is to postpone calling read event handlers for all streams being
closed to avoid closing a released stream.
Previously, such packets were treated as long header packets with unknown
version 0, and a version negotiation packet was sent in response. This
could be used to set up an infinite traffic reflect loop with another nginx
instance.
Now version negotiation packets are ignored. As per RFC 9000, Section 6.1:
An endpoint MUST NOT send a Version Negotiation packet in response to
receiving a Version Negotiation packet.
Since 0-RTT and 1-RTT data exist in the same packet number space,
ngx_quic_discard_ctx incorrectly discards 1-RTT packets when
0-RTT keys are discarded.
The issue was introduced by 58b92177e7.
In particular, an untagged CAPABILITY response as described in the
interim RFC 3501 internet drafts was seen in various IMAP servers.
Previously resulted in a broken connection, now an untagged response
is proxied to client.
When client address is received, IPv6 address could be specified without
square brackets and without port, as well as both with the brackets and
port. The change allows IPv6 in square brackets and no port, which was
previously considered an error. This format conforms to RFC 3986.
The change also affects proxy_bind and friends.
Renaming a temporary file to an empty path ("") returns NGX_ENOPATH
with a subsequent ngx_create_full_path() to create the full path.
This function skips initial bytes as part of path separator lookup,
which causes out of bounds access on short strings.
The fix is to avoid renaming a temporary file to an obviously invalid
path, as well as explicitly forbid such syntax for literal values.
Although Coverity reports about potential type underflow, it is not
actually possible because the terminating '\0' is always included.
Notably, the run-time check is sufficient enough for Win32 as well.
Other short invalid values result either in NGX_ENOENT or NGX_EEXIST
and "MoveFile() .. failed" critical log messages, which involves a
separate error handling.
Prodded by Coverity (CID 1605485).
This simplifies merging protocol values after ea15896 and ebd18ec.
Further, as outlined in ebd18ec18, for libraries preceeding TLSv1.2+
support, only meaningful versions TLSv1 and TLSv1.1 are set by default.
While here, fixed indentation.
When cropping stsc atom, it's assumed that chunk index is never 0.
Based on this assumption, start_chunk and end_chunk are calculated
by subtracting 1 from it. If chunk index is zero, start_chunk or
end_chunk may underflow, which will later trigger
"start/end time is out mp4 stco chunks" error. The change adds an
explicit check for zero chunk index to avoid underflow and report
a proper error.
Zero chunk index is explicitly banned in ISO/IEC 14496-12, 8.7.4
Sample To Chunk Box. It's also implicitly banned in QuickTime File
Format specification. Description of chunk offset table references
"Chunk 1" as the first table element.
Currently an error is triggered if any of the chunk runs in stsc are
unordered. This however does not include the final chunk run, which
ends with trak->chunks + 1. The previous chunk index can be larger
leading to a 32-bit overflow. This could allow to skip the validity
check "if (start_sample > n)". This could later lead to a large
trak->start_chunk/trak->end_chunk, which would be caught later in
ngx_http_mp4_update_stco_atom() or ngx_http_mp4_update_co64_atom().
While there are no implications of the validity check being avoided,
the change still adds a check to ensure the final chunk run is ordered,
to produce a meaningful error and avoid a potential integer overflow.
A specially crafted mp4 file with an empty run of chunks in the stsc atom
and a large value for samples per chunk for that run, combined with a
specially crafted request, allowed to store that large value in prev_samples
and later in trak->end_chunk_samples while in ngx_http_mp4_crop_stsc_data().
Later in ngx_http_mp4_update_stsz_atom() this could result in buffer
overread while calculating trak->end_chunk_samples_size.
Now the value of samples per chunk specified for an empty run is ignored.
Absolute paths in links end up being rooted at github.com.
The contributing guidelines link is broken unless we use the full URL.
Also, remove superfluous "monospace formatting" for the link.
Previously, all upstream DNS entries would be immediately re-resolved
on config reload. With a large number of upstreams, this creates
a spike of DNS resolution requests. These spikes can overwhelm the
DNS server or cause drops on the network.
This patch retains the TTL of previous resolutions across reloads
by copying each upstream's name's expiry time across configuration
cycles. As a result, no additional resolutions are needed.
After configuration is reloaded, it may take some time for the
re-resolvable upstream servers to resolve and become available
as peers. During this time, client requests might get dropped.
Such servers are now pre-resolved using the "cache" of already
resolved peers from the old shared memory zone.
Specifying the upstream server by a hostname together with the
"resolve" parameter will make the hostname to be periodically
resolved, and upstream servers added/removed as necessary.
This requires a "resolver" at the "http" configuration block.
The "resolver_timeout" parameter also affects when the failed
DNS requests will be attempted again. Responses with NXDOMAIN
will be attempted again in 10 seconds.
Upstream has a configuration generation number that is incremented each
time servers are added/removed to the primary/backup list. This number
is remembered by the peer.init method, and if peer.get detects a change
in configuration, it returns NGX_BUSY.
Each server has a reference counter. It is incremented by peer.get and
decremented by peer.free. When a server is removed, it is removed from
the list of servers and is marked as "zombie". The memory allocated by
a zombie peer is freed only when its reference count becomes zero.
Co-authored-by: Roman Arutyunyan <arut@nginx.com>
Co-authored-by: Sergey Kandaurov <pluknet@nginx.com>
Co-authored-by: Vladimir Homutov <vl@nginx.com>
TLSv1 and TLSv1.1 are formally deprecated and forbidden to negotiate due
to insufficient security reasons outlined in RFC 8996.
TLSv1 and TLSv1.1 are disabled in BoringSSL e95b0cad9 and LibreSSL 3.8.1
in the way they cannot be enabled in nginx configuration. In OpenSSL 3.0,
they are only permitted at security level 0 (disabled by default).
The support is dropped in Chrome 84, Firefox 78, and deprecated in Safari.
This change disables TLSv1 and TLSv1.1 by default for OpenSSL 1.0.1 and
newer, where TLSv1.2 support is available. For older library versions,
which do not have alternatives, these protocol versions remain enabled.
Since a2a513b93c, stream frames no longer need to be retransmitted after it
was deleted. The frames which were retransmitted before, could be stream data
frames sent prior to a RESET_STREAM. Such retransmissions are explicitly
prohibited by RFC 9000, Section 19.4.
EVP_KEY objects are a reference-counted container for key material, shallow
copies and OpenSSL stack management aren't needed as with certificates.
Based on previous work by Mini Hawthorne.
Certificate chains are now loaded once.
The certificate cache provides each chain as a unique stack of reference
counted elements. This shallow copy is required because OpenSSL stacks
aren't reference counted.
Based on previous work by Mini Hawthorne.
Added ngx_openssl_cache_module, which indexes a type-aware object cache.
It maps an id to a unique instance, and provides references to it, which
are dropped when the cycle's pool is destroyed.
The cache will be used in subsequent patches.
Based on previous work by Mini Hawthorne.
Instead of cross-linking the objects using exdata, pointers to configured
certificates are now stored in ngx_ssl_t, and OCSP staples are now accessed
with rbtree in it. This allows sharing these objects between SSL contexts.
Based on previous work by Mini Hawthorne.
Starting from TLSv1.1 (as seen since draft-ietf-tls-rfc2246-bis-00),
the "certificate_authorities" field grammar of the CertificateRequest
message was redone to allow no distinguished names. In TLSv1.3, with
the restructured CertificateRequest message, this can be similarly
done by optionally including the "certificate_authorities" extension.
This allows to avoid sending DNs at all.
In practice, aside from published TLS specifications, all supported
SSL/TLS libraries allow to request client certificates with an empty
DN list for any protocol version. For instance, when operating in
TLSv1, this results in sending the "certificate_authorities" list as
a zero-length vector, which corresponds to the TLSv1.1 specification.
Such behaviour goes back to SSLeay.
The change relaxes the requirement to specify at least one trusted CA
certificate in the ssl_client_certificate directive, which resulted in
sending DNs of these certificates (closes#142). Instead, all trusted
CA certificates can be specified now using the ssl_trusted_certificate
directive if needed. A notable difference that certificates specified
in ssl_trusted_certificate are always loaded remains (see 3648ba7db).
Co-authored-by: Praveen Chaudhary <praveenc@nvidia.com>
Pushes to master and stable branches will result in buildbot-like checks
on multiple OSes and architectures.
Pull requests will be checked on a public Ubuntu GitHub runner.
Unordered chunks could result in trak->end_chunk smaller than trak->start_chunk
in ngx_http_mp4_crop_stsc_data(). Later in ngx_http_mp4_update_stco_atom()
this caused buffer overread while trying to calculate trak->end_offset.
While cropping an stsc atom in ngx_http_mp4_crop_stsc_data(), a 32-bit integer
overflow could happen, which could result in incorrect seeking and a very large
value stored in "samples". This resulted in a large invalid value of
trak->end_chunk_samples. This value is further used to calculate the value of
trak->end_chunk_samples_size in ngx_http_mp4_update_stsz_atom(). While doing
this, a large invalid value of trak->end_chunk_samples could result in reading
memory before stsz atom start. This could potentially result in a segfault.
In some rare cases, graceful shutdown may happen while initializing an HTTP/2
connection. Previously, such a connection ignored the shutdown and remained
active. Now it is gracefully closed prior to processing any streams to
eliminate the shutdown delay.
Previously handlers were mandatory. However they are not always needed.
For example, a server configured with ssl_reject_handshake does not need a
handler. Such servers required a fake handler to pass the check. Now handler
absence check is moved to runtime. If handler is missing, the connection is
closed with 500 code.
Previously the last chain field of ngx_quic_buffer_t could still reference freed
chains and buffers after calling ngx_quic_free_buffer(). While normally an
ngx_quic_buffer_t object should not be used after freeing, resetting last_chain
field would prevent a potential use-after-free.
Sending handshake-level CRYPTO frames after the client's Finished message could
lead to memory disclosure and a potential segfault, if those frames are sent in
one packet with the Finished frame.
While inserting a new entry into the dynamic table, first the entry is added,
and then older entries are evicted until table size is within capacity. After
the first step, the number of entries may temporarily exceed the maximum
calculated from capacity by one entry, which previously caused table overflow.
The easiest way to trigger the issue is to keep adding entries with empty names
and values until first eviction.
The issue was introduced by 987bee4363d1.
Previously a decoder stream was created on demand for sending Section
Acknowledgement, Stream Cancellation and Insert Count Increment. If conditions
for sending any of these instructions never happen, a decoder stream is not
created at all. These conditions include client not using the dynamic table and
no streams abandoned by server (RFC 9204, Section 2.2.2.2). However RFC 9204,
Section 4.2 defines only one condition for not creating a decoder stream:
An endpoint MAY avoid creating a decoder stream if its decoder sets
the maximum capacity of the dynamic table to zero.
The change enables pre-creation of the decoder stream at HTTP/3 session
initialization if maximum dynamic table capacity is not zero. Note that this
value is currently hardcoded to 4096 bytes and is not configurable, so the
stream is now always created.
Also, the change fixes a potential stack overflow when creating a decoder
stream in ngx_http_v3_send_cancel_stream() while draining a request stream by
ngx_drain_connections(). Creating a decoder stream involves calling
ngx_get_connection(), which calls ngx_drain_connections(), which will drain the
same request stream again. If client's MAX_STREAMS for uni stream is high
enough, these recursive calls will continue until we run out of stack.
Otherwise, decoder stream creation will fail at some point and the request
stream connection will be drained. This may result in use-after-free, since
this connection could still be referenced up the stack.
Previously chain links could sometimes be dropped instead of being reused,
which could result in increased memory consumption during long requests.
A similar chain link issue in ngx_http_gzip_filter_module was fixed in
da46bfc484ef (1.11.10).
Based on a patch by Sangmin Lee.
Using "long *" instead of "AO_t *" leads either to -Wincompatible-pointer-types
or -Wpointer-sign warnings, depending on whether long and size_t are compatible
types (e.g., ILP32 versus LP64 data models). Notably, -Wpointer-sign warnings
are enabled by default in Clang only, and -Wincompatible-pointer-types is an
error starting from GCC 14.
Signed-off-by: Edgar Bonet <bonet@grenoble.cnrs.fr>
Passing from udp was not possible for the most part due to preread buffer
restriction. Passing to udp could occasionally work, but the connection would
still be bound to the original listen rbtree, which prevented it from being
deleted on connection closure.
When loading certificate keys via ENGINE_load_private_key() in runtime,
it was possible to overwrite configuration on ENGINE_by_id() failure.
OpenSSL documention doesn't describe errors in details, the only reason
I found in the comment to example is when the engine is not available.
Previously, a request body larger than declared in Content-Length resulted in
a 413 status code, because Content-Length was mistakenly used as the maximum
allowed request body, similar to client_max_body_size. Following the HTTP/3
specification, such requests are now rejected with the 400 error as malformed.
The ngx_quic_run() function uses qc->close timer to limit the handshake
duration. Normally it is removed by ngx_quic_do_init_streams() which is
called once when we are done with initial SSL processing.
The problem happens when the client sends early data and streams are
initialized in the ngx_quic_run() -> ngx_quic_handle_datagram() call.
The order of set/remove timer calls is now reversed; the close timer is
set up and the timer fires when assigned, starting the unexpected connection
close process.
The fix is to skip setting the timer if streams were initialized during
handling of the initial datagram. The idle timer for quic is set anyway,
and stream-related timeouts are managed by application layer.
Notably, Apple Silicon CPUs have 128 byte cache line size,
which is twice the default configured for generic aarch64.
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Previously, the response text wasn't initialized and the rewrite module
was sending response body set to NULL.
Found with UndefinedBehaviorSanitizer (pointer-overflow).
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Previously, it could result when left-shifting signed integer due to implicit
integer promotion, such that the most significant bit appeared on the sign bit.
In practice, though, this results in the same left value as with an explicit
cast, at least on known compilers, such as GCC and Clang. The reason is that
in_addr_t, which is equivalent to uint32_t and same as "unsigned int" in ILP32
and LP64 data type models, has the same type width as the intermediate after
integer promotion, so there's no side effects such as sign-extension. This
explains why adding an explicit cast does not change object files in practice.
Found with UndefinedBehaviorSanitizer (shift).
Based on a patch by Piotr Sikora.
While copying ngx_http_variable_value_t structures to geo binary base
in ngx_http_geo_copy_values(), and similarly in the stream module,
uninitialized parts of these structures are copied as well. These
include the "escape" field and possible holes. Calculating crc32 of
this data triggers uninitialized memory access.
Found with MemorySanitizer.
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
In preparation for adding more parameters to the listen directive,
and to be in sync with the corresponding structure in the http module.
No functional changes.
Originally, the stream module was developed based on the mail module,
following the existing style. Then it was diverged to closely follow
the http module development. This change updates style to use sscf
naming convention troughout the stream module, which matches the http
module code style. No functional changes.
The module allows to pass connections from Stream to other modules such as HTTP
or Mail, as well as back to Stream. Previously, this was only possible with
proxying. Connections with preread buffer read out from socket cannot be
passed.
The module allows selective SSL termination based on SNI.
stream {
server {
listen 8000 default_server;
ssl_preread on;
...
}
server {
listen 8000;
server_name foo.example.com;
pass 127.0.0.1:8001; # to HTTP
}
server {
listen 8000;
server_name bar.example.com;
...
}
}
http {
server {
listen 8001 ssl;
...
location / {
root html;
}
}
}
Server name is taken either from ngx_stream_ssl_module or
ngx_stream_ssl_preread_module.
The change adds "default_server" parameter to the "listen" directive,
as well as the following directives: "server_names_hash_max_size",
"server_names_hash_bucket_size", "server_name" and "ssl_reject_handshake".
Previously, preread buffer was always read out from socket, which made it
impossible to terminate SSL on the connection without introducing additional
SSL BIOs. The following patches will rely on this.
Now, when possible, recv(MSG_PEEK) is used instead, which keeps data in socket.
It's called if SSL is not already terminated and if an egde-triggered event
method is used. For epoll, EPOLLRDHUP support is also required.
Stream connection cleanup handler ngx_quic_stream_cleanup_handler() calls
ngx_quic_shutdown_stream() after which it resets the pointer from quic stream
to the connection (sc->connection = NULL). Previously if this call failed,
sc->connection retained the old value, while the connection was freed by the
application code. This resulted later in a second attempt to close the freed
connection, which lead to allocator double free error.
The fix is to reset the sc->connection pointer in case of error.
Inspired by RFC 9001, Section 6.3, trial packet decryption with the current
keys is now used to avoid a timing side-channel signal. Further, this fixes
segfault while accessing missing next keys (ticket #2585).
Previously if an MTU probe send failed early in ngx_quic_frame_sendto()
due to allocation error or congestion control, the application level packet
number was not increased, but was still saved as MTU probe packet number.
Later when a packet with this number was acknowledged, the unsent MTU probe
was acknowledged as well. This could result in discovering a bigger MTU than
supported by the path, which could lead to EMSGSIZE (Message too long) errors
while sending further packets.
The problem existed since PMTUD was introduced in 58afcd72446f (1.25.2).
Back then only the unlikely memory allocation error could trigger it. However
in efcdaa66df2e congestion control was added to ngx_quic_frame_sendto() which
can now trigger the issue with a higher probability.
Now "fastopen", "backlog", "accept_filter", "deferred", and "so_keepalive"
parameters are not allowed with "quic" in the "listen" directive.
Reported by Izorkin.
When filter finalization is triggered when working with an upstream server,
and error_page redirects request processing to some simple handler,
ngx_http_request_finalize() triggers request termination when the response
is sent. In particular, via the upstream cleanup handler, nginx will close
the upstream connection and the corresponding socket.
Still, this can happen to be with ngx_event_pipe() on stack. While
the code will set p->downstream_error due to NGX_ERROR returned from the
output filter chain by filter finalization, otherwise the error will be
ignored till control returns to ngx_http_upstream_process_request().
And event pipe might try reading from the (already closed) socket, resulting
in "readv() failed (9: Bad file descriptor) while reading upstream" errors
(or even segfaults with SSL).
Such errors were seen with the following configuration:
location /t2 {
proxy_pass http://127.0.0.1:8080/big;
image_filter_buffer 10m;
image_filter resize 150 100;
error_page 415 = /empty;
}
location /empty {
return 204;
}
location /big {
# big enough static file
}
Fix is to clear p->upstream in ngx_http_upstream_finalize_request(),
and ensure that p->upstream is checked in ngx_event_pipe_read_upstream()
and when handling events at ngx_event_pipe() exit.
When a request was terminated due to an error via ngx_http_terminate_request()
while an AIO operation was running in a subrequest, various issues were
observed. This happened because ngx_http_request_finalizer() was only set
in the subrequest where ngx_http_terminate_request() was called, but not
in the subrequest where the AIO operation was running. After completion
of the AIO operation normal processing of the subrequest was resumed, leading
to issues.
In particular, in case of the upstream module, termination of the request
called upstream cleanup, which closed the upstream connection. Attempts to
further work with the upstream connection after AIO operation completion
resulted in segfaults in ngx_ssl_recv(), "readv() failed (9: Bad file
descriptor) while reading upstream" errors, or socket leaks.
In ticket #2555, issues were observed with the following configuration
with cache background update (with thread writing instrumented to
introduce a delay, when a client closes the connection during an update):
location = /background-and-aio-write {
proxy_pass ...
proxy_cache one;
proxy_cache_valid 200 1s;
proxy_cache_background_update on;
proxy_cache_use_stale updating;
aio threads;
aio_write on;
limit_rate 1000;
}
Similarly, the same issue can be seen with SSI, and can be caused by
errors in subrequests, such as in the following configuration
(where "/proxy" uses AIO, and "/sleep" returns 444 after some delay,
causing request termination):
location = /ssi-active-boom {
ssi on;
ssi_types *;
return 200 '
<!--#include virtual="/proxy" -->
<!--#include virtual="/sleep" -->
';
limit_rate 1000;
}
Or the same with both AIO operation and the error in non-active subrequests
(which needs slightly different handling, see below):
location = /ssi-non-active-boom {
ssi on;
ssi_types *;
return 200 '
<!--#include virtual="/static" -->
<!--#include virtual="/proxy" -->
<!--#include virtual="/sleep" -->
';
limit_rate 1000;
}
Similarly, issues can be observed with just static files. However,
with static files potential impact is limited due to timeout safeguards
in ngx_http_writer(), and the fact that c->error is set during request
termination.
In a simple configuration with an AIO operation in the active subrequest,
such as in the following configuration, the connection is closed right
after completion of the AIO operation anyway, since ngx_http_writer()
tries to write to the connection and fails due to c->error set:
location = /ssi-active-static-boom {
ssi on;
ssi_types *;
return 200 '
<!--#include virtual="/static-aio" -->
<!--#include virtual="/sleep" -->
';
limit_rate 1000;
}
In the following configuration, with an AIO operation in a non-active
subrequest, the connection is closed only after send_timeout expires:
location = /ssi-non-active-static-boom {
ssi on;
ssi_types *;
return 200 '
<!--#include virtual="/static" -->
<!--#include virtual="/static-aio" -->
<!--#include virtual="/sleep" -->
';
limit_rate 1000;
}
Fix is to introduce r->main->terminated flag, which is to be checked
by AIO event handlers when the r->main->blocked counter is decremented.
When the flag is set, handlers are expected to wake up the connection
instead of the subrequest (which might be already cleaned up).
Additionally, now ngx_http_request_finalizer() is always set in the
active subrequest, so waking up the connection properly finalizes the
request even if termination happened in a non-active subrequest.
Each AIO (thread IO) operation being run is now accompanied with 1-minute
timer. This timer prevents unexpected shutdown of the worker process while
an AIO operation is running, and logs an alert if the operation is running
for too long.
This fixes "open socket left" alerts during worker processes shutdown
due to pending AIO (or thread IO) operations while corresponding requests
have no timers. In particular, such errors were observed while reading
cache headers (ticket #2162), and with worker_shutdown_timeout.
When graceful shutdown was requested, and then nginx was forced to
do fast shutdown, it used to (incorrectly) complain about open sockets
left in connections which weren't yet closed when fast shutdown
was requested.
Fix is to avoid complaining about open sockets when fast shutdown was
requested after graceful one. Abnormal termination, if requested with
the WINCH signal, can still happen though.
OPENSSL_VERSION_NUMBER is now redefined to 0x1010000fL for LibreSSL 3.5.0
and above. Building with older LibreSSL versions, such as 2.8.0, may now
produce warnings (see cab37803ebb3) and may require appropriate compiler
options to suppress them.
Notably, this allows to start using SSL_get0_verified_chain() appeared
in OpenSSL 1.1.0 and LibreSSL 3.5.0, without additional macro tests.
Prodded by Ilya Shipitsin.
Similar to 7356:e3ba4026c02d, as long as SSL_OP_NO_CLIENT_RENEGOTIATION
is defined, it is the library responsibility to prevent renegotiation.
Additionally, this allows to raise LibreSSL version used to redefine
OPENSSL_VERSION_NUMBER to 0x1010000fL, such that this won't result in
attempts to dereference SSL objects made opaque in LibreSSL 3.4.0.
Patch by Maxim Dounin.
rand() is used on win32. RAND_MAX is implementation defined. win32's is
0x7fff.
Existing uses of ngx_random() rely upon 0x7fffffff range provided by
POSIX implementations of random().
On-packet acknowledgement is made path aware, as per RFC 9000, Section 9.4:
Packets sent on the old path MUST NOT contribute to congestion control
or RTT estimation for the new path.
To make this possible in a single congestion control context, the first packet
to be sent after the new path has been validated, which includes resetting the
congestion controller and RTT estimator, is now remembered in the connection.
Packets sent previously, such as on the old path, are not taken into account.
Note that although the packet number is saved per-connection, the added checks
affect application level packets only. For non-application level packets,
which are only processed prior to the handshake is complete, the remembered
packet number remains set to zero.
As per RFC 9000, Section 8.2.1:
When an endpoint is unable to expand the datagram size to 1200 bytes due
to the anti-amplification limit, the path MTU will not be validated.
To ensure that the path MTU is large enough, the endpoint MUST perform a
second path validation by sending a PATH_CHALLENGE frame in a datagram of
at least 1200 bytes.
Previously ngx_quic_frame_sendto() ignored congestion control and did not
contribute to in_flight counter.
Now congestion control window is checked unless ignore_congestion flag is set.
Also, in_flight counter is incremented and the frame is stored in ctx->sent
queue if it's ack-eliciting. This behavior is now similar to
ngx_quic_output_packet().
According to RFC 9000, an endpoint SHOULD NOT send multiple PATH_CHALLENGE
frames in a single packet. The change adds a check to enforce this claim to
optimize server behavior. Previously each PATH_CHALLENGE always resulted in a
single response datagram being sent to client. The effect of this was however
limited by QUIC flood protection.
Also, PATH_CHALLENGE is explicitly disabled in Initial and Handshake levels,
see RFC 9000, Table 3. However, technically it may be sent by client in 0-RTT
over a new path without actual migration, even though the migration itself is
prohibited during handshake. This allows client to coalesce multiple 0-RTT
packets each carrying a PATH_CHALLENGE and end up with multiple PATH_CHALLENGEs
per datagram. This again leads to suboptimal behavior, see above. Since the
purpose of sending PATH_CHALLENGE frames in 0-RTT is unclear, these frames are
now only allowed in 1-RTT. For 0-RTT they are silently ignored.
Previously, when using ngx_quic_frame_sendto() to explicitly send a packet with
a single frame, anti-amplification limit was not properly enforced. Even when
there was no quota left for the packet, it was sent anyway, but with no padding.
Now the packet is not sent at all.
This function is called to send PATH_CHALLENGE/PATH_RESPONSE, PMTUD and probe
packets. For all these cases packet send is retried later in case the send was
not successful.
By default packets with these frames are expanded to 1200 bytes. Previously,
if anti-amplification limit did not allow this expansion, it was limited to
whatever size was allowed. However RFC 9000 clearly states no partial
expansion should happen in both cases.
Section 8.2.1. Initiating Path Validation:
An endpoint MUST expand datagrams that contain a PATH_CHALLENGE frame
to at least the smallest allowed maximum datagram size of 1200 bytes,
unless the anti-amplification limit for the path does not permit
sending a datagram of this size.
Section 8.2.2. Path Validation Responses:
An endpoint MUST expand datagrams that contain a PATH_RESPONSE frame
to at least the smallest allowed maximum datagram size of 1200 bytes.
...
However, an endpoint MUST NOT expand the datagram containing the
PATH_RESPONSE if the resulting data exceeds the anti-amplification limit.
If URI is not fully parsed yet, some pointers are not set. As a result,
the calculation of "new + (ptr - old)" expression is flawed.
According to C11, 6.5.6 Additive operators, p.9:
: When two pointers are subtracted, both shall point to elements
: of the same array object, or one past the last element of the
: array object
Since "ptr" is not set, subtraction leads to undefined behaviour, because
"ptr" and "old" are not in the same buffer (i.e. array objects).
Prodded by GCC undefined behaviour sanitizer.
Neither r->port_start nor r->port_end were ever used.
The r->port_end is set by the parser, though it was never used by
the following code (and was never usable, since not copied by the
ngx_http_alloc_large_header_buffer() without r->port_start set).
Currently, packets generated by ngx_quic_frame_sendto() and
ngx_quic_send_early_cc() are not logged, thus making it hard
to read logs due to gaps appearing in packet numbers sequence.
At frames level, it is handy to see immediately packet number
in which they arrived or being sent.
As part of normal HTTP/2 processing, incomplete frames are saved in the
control state using a fixed size memcpy of NGX_HTTP_V2_STATE_BUFFER_SIZE.
For this matter, two state buffers are reserved in the HTTP/2 recv buffer.
As part of HTTP/2 auto-detection on plain TCP connections, initial data
is first read into a buffer specified by the client_header_buffer_size
directive that doesn't have state reservation. Previously, this made it
possible to over-read the buffer as part of saving the state.
The fix is to read the available buffer size rather than a fixed size.
Although memcpy of a fixed size can produce a better optimized code,
handling of incomplete frames isn't a common execution path, so it was
sacrificed for the sake of simplicity of the fix.
It is made local as it is only needed now when creating crypto context.
BoringSSL lacks EVP interface for ChaCha20, providing instead
a function for one-shot encryption, thus hp is still preserved.
Based on a patch by Roman Arutyunyan.
Previously it was possible to generate ACK frames using formally discarded
protection keys, in particular, when acknowledging a client Handshake packet
used to complete the TLS handshake and to discard handshake protection keys.
As it happens late in packet processing, it could be possible to generate ACK
frames after the keys were already discarded.
ACK frames are generated from ngx_quic_ack_packet(), either using a posted
push event, which envolves ngx_quic_generate_ack() as a part of the final
packet assembling, or directly in ngx_quic_ack_packet(), such as when there
is no room to add a new ACK range or when the received packet is out of order.
The added keys availability check is used to avoid generating late ACK frames
in both cases.
In addition to triggering alert, it ensures that such packets won't be sent.
With the previous change that marks server keys as discarded by zeroing the
key lengh, it is now an error to send packets with discarded keys. OpenSSL
based stacks tolerate such behaviour because key length isn't used in packet
protection, but BoringSSL will raise the UNSUPPORTED_KEY_SIZE cipher error.
It won't be possible to use discarded keys with reused crypto contexts as it
happens in subsequent changes.
Keys may be released by TLS stack in different times, so it makes sense
to check this independently as well. This allows to fine-tune what key
direction is used when checking keys availability.
When discarding, server keys are now marked in addition to client keys.
This improves nginx startup times significantly when using very large number
of locations due to computational complexity of the sorting algorithm being
used: insertion sort is O(n*n) on average, while merge sort is O(n*log(n)).
In particular, in a test configuration with 20k locations total startup
time is reduced from 8 seconds to 0.9 seconds.
Prodded by Yusuke Nojima,
https://mailman.nginx.org/pipermail/nginx-devel/2023-September/NUL3Y2FPPFSHMPTFTL65KXSXNTX3NQMK.html
In ngx_regex_cleanup() allocator wasn't configured when calling
pcre2_compile_context_free() and pcre2_match_data_free(), resulting
in no ngx_free() call and leaked memory. Fix is ensure that allocator
is configured for global allocations, so that ngx_free() is actually
called to free memory.
Additionally, ngx_regex_compile_context was cleared in
ngx_regex_module_init(). It should be either not cleared, so it will
be freed by ngx_regex_cleanup(), or properly freed. Fix is to
not clear it, so ngx_regex_cleanup() will be able to free it.
Reported by ZhenZhong Wu,
https://mailman.nginx.org/pipermail/nginx-devel/2023-September/3Z5FIKUDRN2WBSL3JWTZJ7SXDA6YIWPB.html
To ensure that attempts to flood servers with many streams are detected
early, a limit of no more than 2 * max_concurrent_streams new streams per one
event loop iteration was introduced. This limit is applied even if
max_concurrent_streams is not yet reached - for example, if corresponding
streams are handled synchronously or reset.
Further, refused streams are now limited to maximum of max_concurrent_streams
and 100, similarly to priority_limit initial value, providing some tolerance
to clients trying to open several streams at the connection start, yet
low tolerance to flooding attempts.
The error may be triggered in add_handhshake_data() by incorrect transport
parameter sent by client. The expected behaviour in this case is to close
connection complaining about incorrect parameter. Currently the connection
just times out.
Previously, the timer was never reset due to an explicit check. The check was
added in 36b59521a41c as part of connection close simplification. The reason
was to retain the earliest timeout. However, the timeouts are all the same
while QUIC handshake is in progress and resetting the timer for the same value
has no performance implications. After handshake completion there's only
application level. The change removes the check.
Now the session object is assigned to c->data while ngx_http_connection_t
object is referenced by its http_connection field, similar to
ngx_http_v2_connection_t and ngx_http_request_t.
The change allows to eliminate v3_session field from ngx_http_connection_t.
The field was under NGX_HTTP_V3 macro, which was a source of binary
compatibility problems when nginx/module is build with/without HTTP/3 support.
Postponing is essential since c->data should retain the reference to
ngx_http_connection_t object throughout QUIC handshake, because SSL callbacks
ngx_http_ssl_servername() and ngx_http_ssl_alpn_select() rely on this.
Instead, when worker is shutting down and handshake is not yet completed,
connection is terminated immediately.
Previously the callback could be called while QUIC handshake was in progress
and, what's more important, before the init() callback. Now it's postponed
after init().
This change is a preparation to postponing HTTP/3 session creation to init().
Previously QUIC did not have such parameter and handshake duration was
controlled by HTTP/3. However that required creating and storing HTTP/3
session on first client datagram. Apparently there's no convenient way to
store the session object until QUIC handshake is complete. In the followup
patches session creation will be postponed to init() callback.
As explained in BoringSSL change[1], levels were introduced in the original
QUIC API to draw a line between when keys are released and when are active.
In the new QUIC API they are released in separate calls when it's needed.
BoringSSL has then a consideration to remove levels API, hence the change.
If not available e.g. from a QUIC packet header, levels can be taken based on
keys availability. The only real use of levels is to prevent using app keys
before they are active in QuicTLS that provides the old BoringSSL QUIC API,
it is replaced with an equivalent check of c->ssl->handshaked.
This change also removes OpenSSL compat shims since they are no longer used.
The only exception left is caching write level from the keylog callback in
the internal field which is a handy equivalent of checking keys availability.
[1] https://boringssl.googlesource.com/boringssl/+/1e859054
As per RFC 9000, section 10.2.3, to ensure that peer successfully removed
packet protection, CONNECTION_CLOSE can be sent in multiple packets using
different packet protection levels.
Now it is sent in all protection levels available.
This roughly corresponds to the following paragraph:
* Prior to confirming the handshake, a peer might be unable to process 1-RTT
packets, so an endpoint SHOULD send a CONNECTION_CLOSE frame in both Handshake
and 1-RTT packets. A server SHOULD also send a CONNECTION_CLOSE frame in an
Initial packet.
In practice, this change allows to avoid sending an Initial packet when we know
the client has handshake keys, by checking if we have discarded initial keys.
Also, this fixes sending CONNECTION_CLOSE when using QuicTLS with old QUIC API,
where TLS stack releases application read keys before handshake confirmation;
it is fixed by sending CONNECTION_CLOSE additionally in a Handshake packet.
Status header with an empty reason-phrase, such as "Status: 404 ", is
valid per CGI specification, but looses the trailing space during parsing.
Currently, this results in "HTTP/1.1 404" HTTP status line in the response,
which violates HTTP specification due to missing trailing space.
With this change, only the status code is used from such short Status
header lines, so nginx will generate status line itself, with the space
and appropriate reason phrase if available.
Reported at:
https://mailman.nginx.org/pipermail/nginx/2023-August/EX7G4JUUHJWJE5UOAZMO5UD6OJILCYGX.html
Previously, a socket error on a path being validated resulted in validation
error and subsequent QUIC connection closure. Now the error is ignored and
path validation proceeds as usual, with several retries and a timeout.
When validating the old path after an apparent migration, that path may already
be unavailable and sendmsg() may return an error, which should not result in
QUIC connection close.
When validating the new path, it's possible that the new client address is
spoofed (See RFC 9000, 9.3.2. On-Path Address Spoofing). This address may
as well be unavailable and should not trigger QUIC connection closure.
Previously, original dcid was used to receive initial client packets in case
server initial response was lost. However, last dcid should be used instead.
These two are the same unless retry is used. In case of retry, client resends
initial packet with a new dcid, that is different from the original dcid. If
server response is lost, the client resends this packet again with the same
dcid. This is shown in RFC 9000, 7.3. Authenticating Connection IDs, Figure 8.
The issue manifested itself with creating multiple server sessions in response
to each post-retry client initial packet, if server response is lost.
Since at least f9fbeb4ee0de and certainly after 924882f42dea, which
TLS Key Update support predates, queued data output is deferred to a
posted push handler. To address timing signals after these changes,
generating next keys is now posted to run after the push handler.
Previously, NGX_AGAIN returned by ngx_quic_send() was treated by
ngx_quic_frame_sendto() as error, which triggered errors in its callers.
However, a blocked socket is not an error. Now NGX_AGAIN is passed as is to
the ngx_quic_frame_sendto() callers, which can safely ignore it.
When probe timeout expired while congestion window was exhausted, probe PINGs
could not be sent. As a result, lost packets could not be declared lost and
congestion window could not be freed for new packets. This deadlock
continued until connection idle timeout expiration.
Now PINGs are sent separately from the frame queue without congestion control,
as specified by RFC 9002, Section 7:
An endpoint MUST NOT send a packet if it would cause bytes_in_flight
(see Appendix B.2) to be larger than the congestion window, unless the
packet is sent on a PTO timer expiration (see Section 6.2) or when entering
recovery (see Section 7.3.2).
Previously, PTO handler analyzed the first packet in the sent queue for the
timeout expiration. However, the last sent packet should be analyzed instead.
An example is timeout calculation in ngx_quic_set_lost_timer().
Previously the field pnum of a potentially freed frame was accessed. Now the
value is copied to a local variable. The old behavior did not cause any
problems since the frame memory is not freed, but is moved to a free queue
instead.
In non-GSO mode, a datagram is sent if congestion window is not exceeded by the
time of send. The window could be exceeded by a small amount after the send.
In GSO mode, congestion window was checked in a similar way, but for all
concatenated datagrams as a whole. This could result in exceeding congestion
window by a lot. Now congestion window is checked for every datagram in GSO
mode as well.
Previously it was added to the tail as all other frames. However, if the
amount of queued data is large, it could delay the delivery of ACK, which
could trigger frames retransmissions and slow down the connection.
Previously ACK was not generated if max_ack_delay was not yet expired and the
number of unacknowledged ack-eliciting packets was less than two, as allowed by
RFC 9000 13.2.1-13.2.2. However this only makes sense to avoid sending ACK-only
packets, as explained by the RFC:
On the other hand, reducing the frequency of packets that carry only
acknowledgments reduces packet transmission and processing cost at both
endpoints.
Now ACK is delayed only if output frame queue is empty. Otherwise ACK is sent
immediately, which significantly improves QUIC performance with certain tests.
With this change, the NGX_OPENSSL_NO_CONFIG macro is defined when nginx
is asked to build OpenSSL itself. And with this macro automatic loading
of OpenSSL configuration (from the build directory) is prevented unless
the OPENSSL_CONF environment variable is explicitly set.
Note that not loading configuration is broken in OpenSSL 1.1.1 and 1.1.1a
(fixed in OpenSSL 1.1.1b, see https://github.com/openssl/openssl/issues/7350).
If nginx is used to compile these OpenSSL versions, configuring nginx with
NGX_OPENSSL_NO_CONFIG explicitly set to 0 might be used as a workaround.
Following OpenSSL 0.9.8f, OpenSSL tries to load application-specific
configuration section first, and then falls back to the "openssl_conf"
default section if application-specific section is not found, by using
CONF_modules_load_file(CONF_MFLAGS_DEFAULT_SECTION). Therefore this
change is not expected to introduce any compatibility issues with existing
configurations. It does, however, make it easier to configure specific
OpenSSL settings for nginx in system-wide OpenSSL configuration
(ticket #2449).
Instead of checking OPENSSL_VERSION_NUMBER when using the OPENSSL_init_ssl()
interface, the code now tests for OPENSSL_INIT_LOAD_CONFIG to be defined and
true, and also explicitly excludes LibreSSL. This ensures that this interface
is not used with BoringSSL and LibreSSL, which do not provide additional
library initialization settings, notably the OPENSSL_INIT_set_config_appname()
call.
Similarly to 6822:c045b4926b2c, environment variables introduced with
the "env" directive (and "NGINX_BPF_MAPS" added by QUIC) are now allocated
via ngx_alloc(), and explicitly freed by a cleanup handler if no longer used.
In collaboration with Sergey Kandaurov.
Previously used constant EVP_GCM_TLS_TAG_LEN had misleading name since it was
used not only with GCM, but also with CHACHAPOLY. Now a new constant
NGX_QUIC_TAG_LEN introduced. Luckily all AEAD algorithms used by QUIC have
the same tag length of 16.
Previously, computing rttvar used an updated smoothed_rtt value as per
RFC 9002, section 5.3, which appears to be specified in a wrong order.
A technical errata ID 7539 is reported.
Although it has better implementation status than HTTP/3 server push,
it remains of limited use, with adoption numbers seen as negligible.
Per IETF 102 materials, server push was used only in 0.04% of sessions.
It was considered to be "difficult to use effectively" in RFC 9113.
Its use is further limited by badly matching to fetch/cache/connection
models in browsers, see related discussions linked from [1].
Server push was disabled in Chrome 106 [2].
The http2_push, http2_push_preload, and http2_max_concurrent_pushes
directives are made obsolete. In particular, this essentially reverts
7201:641306096f5b and 7207:3d2b0b02bd3d.
[1] https://jakearchibald.com/2017/h2-push-tougher-than-i-thought/
[2] https://chromestatus.com/feature/6302414934114304
It has been deprecated since 7270:46c0c7ef4913 (1.15.0) in favour of
the "ssl" parameter of the "listen" directive, which has been available
since 2224:109849282793 (0.7.14).
The directive enables HTTP/2 in the current server. The previous way to
enable HTTP/2 via "listen ... http2" is now deprecated. The new approach
allows to share HTTP/2 and HTTP/0.9-1.1 on the same port.
For SSL connections, HTTP/2 is now selected by ALPN callback based on whether
the protocol is enabled in the virtual server chosen by SNI. This however only
works since OpenSSL 1.0.2h, where ALPN callback is invoked after SNI callback.
For older versions of OpenSSL, HTTP/2 is enabled based on the default virtual
server configuration.
For plain TCP connections, HTTP/2 is now auto-detected by HTTP/2 preface, if
HTTP/2 is enabled in the default virtual server. If preface is not matched,
HTTP/0.9-1.1 is assumed.
Previously, rec.level field was not uninitialized in SSL_provide_quic_data().
As a result, its value was always ssl_encryption_initial. Later in
ngx_quic_ciphers() such level resulted in resetting the cipher to
TLS1_3_CK_AES_128_GCM_SHA256 and using AES128 to encrypt the packet.
Now the level is initialized and the right cipher is used.
The layer is enabled as a fallback if the QUIC support is configured and the
BoringSSL API wasn't detected, or when using the --with-openssl option, also
compatible with QuicTLS and LibreSSL. For the latter, the layer is assumed
to be present if QUIC was requested, so it needs to be undefined to prevent
QUIC API redefinition as appropriate.
A previously used approach to test the TLSEXT_TYPE_quic_transport_parameters
macro doesn't work with OpenSSL 3.2 master branch where this macro appeared
with incompatible QUIC API. To fix the build there, the test is revised to
pass only for QuicTLS and LibreSSL.
Previously, ngx_quic_close_connection() could be called in a way that QUIC
connection was accessed after the call. In most cases the connection is not
closed right away, but close timeout is scheduled. However, it's not always
the case. Also, if the close process started earlier for a different reason,
calling ngx_quic_close_connection() may actually close the connection. The
connection object should not be accessed after that.
Now, when possible, return statement is added to eliminate post-close connection
object access. In other places ngx_quic_close_connection() is substituted with
posting close event.
Also, the new way of closing connection in ngx_quic_stream_cleanup_handler()
fixes another problem in this function. Previously it passed stream connection
instead of QUIC connection to ngx_quic_close_connection(). This could result
in incomplete connection shutdown. One consequence of that could be that QUIC
streams were freed without shutting down their application contexts. This could
result in another use-after-free.
Found by Coverity (CID 1530402).
The qsock->sockaddr field is a ngx_sockaddr_t union, and therefore can hold
any sockaddr (and union members, such qsock->sockaddr.sockaddr, can be used
to access appropriate variant of the sockaddr). It is better to set it via
qsock->sockaddr itself though, and not qsock->sockaddr.sockaddr, so static
analyzers won't complain about out-of-bounds access.
Prodded by Coverity (CID 1530403).
Previously, ngx_udp_rbtree_insert_value() was used for plain UDP and
ngx_quic_rbtree_insert_value() was used for QUIC. Because of this it was
impossible to initialize connection tree in ngx_create_listening() since
this function is not aware what kind of listening it creates.
Now ngx_udp_rbtree_insert_value() is used for both QUIC and UDP. To make
is possible, a generic key field is added to ngx_udp_connection_t. It keeps
client address for UDP and connection ID for QUIC.
The directive used to set the value of the "max_udp_payload_size" transport
parameter. According to RFC 9000, Section 18.2, the value specifies the size
of buffer for reading incoming datagrams:
This limit does act as an additional constraint on datagram size in
the same way as the path MTU, but it is a property of the endpoint
and not the path; see Section 14. It is expected that this is the
space an endpoint dedicates to holding incoming packets.
Current QUIC implementation uses the maximum possible buffer size (65527) for
reading datagrams.
HTTP and Stream variables $remote_addr and $binary_remote_addr rely on
constant client address, particularly because they are cacheable.
However, QUIC client may migrate to a new address. While there's no perfect
way to handle this, the proposed solution is to copy client address to QUIC
stream at stream creation.
The change also fixes truncated $remote_addr if migration happened while the
stream was active. The reason is addr_text string was copied to stream by
value.
The existing logic to evaluate multi header "$sent_http_*" variables,
such as $sent_http_cache_control, as previously introduced in 1.23.0,
doesn't take into account that one or more elements can be cleared,
yet still present in a linked list, pointed to by the next field.
Such elements don't contribute to the resulting variable length, an
attempt to append a separator for them ends up in out of bounds write.
This is not possible with standard modules, though at least one third
party module is known to override multi header values this way, so it
makes sense to harden the logic.
The fix restores a generic boundary check.
Previously, the value was not set and remained zero. While in nginx code the
value of c->sockaddr is accessed without taking c->socklen into account,
invalid c->socklen could lead to unexpected results in third-party modules.
Previously, the post-migration value of addr_text could be truncated, if
it was longer than the previous one. Also, the new value always included
port, which should not be there.
According to RFC 9000, 8.2.4. Failed Path Validation,
the following value is recommended as a validation timeout:
A value of three times the larger of the current PTO
or the PTO for the new path (using kInitialRtt, as
defined in [QUIC-RECOVERY]) is RECOMMENDED.
The change adds PTO of the new path to the equation as the lower bound.
Path validation packets containing PATH_CHALLENGE frames are sent separately
from regular frame queue, because of the need to use a decicated path and pad
the packets. The packets are sent periodically, separately from the regular
probe/lost detection mechanism. A path validation packet is resent up to 3
times, each time after PTO expiration, with increasing per-path PTO backoff.
The check is needed for clients in order to unblock a server due to
anti-amplification limits, and it seems to make no sense for servers.
See RFC 9002, A.6 and A.8 for a further explanation.
This makes max_ack_delay to now always account, notably including
PATH_CHALLENGE timers as noted in the last paragraph of 9000, 9.4,
unlike when it was only used when there are packets in flight.
While here, fixed nearby style.
Previously, ssl_encryption_application was hardcoded. Before 9553eea74f2a,
ngx_quic_frame_sendto() was used only for PATH_CHALLENGE/PATH_RESPONSE sent
at the application level only. Since 9553eea74f2a, ngx_quic_frame_sendto()
is also used for CONNECTION_CLOSE, which can be sent at initial level after
SSL handshake error or rejection. This resulted in packet encryption error.
Now level is copied from frame, which fixes the error.
Previously, before sending CONNECTION_CLOSE to client, all pending frames
were sent. This is redundant and could prevent CONNECTION_CLOSE from being
sent due to congestion control. Now pending frames are freed and
CONNECTION_CLOSE is sent without congestion control, as advised by RFC 9002:
Packets containing frames besides ACK or CONNECTION_CLOSE frames
count toward congestion control limits and are considered to be in flight.
Do not corrupt frame data chain pointer on ngx_quic_read_buffer() error.
The error leads to closing a QUIC connection where the frame may be used
as part of the QUIC connection tear down, which envolves writing pending
frames, including this one.
The rcf->studies list is unconditionally accessed by ngx_regex_cleanup(),
and this used to cause NULL pointer dereference if allocation
failed. Fix is to set cleanup handler only when allocation succeeds.
Previously, waiting on a shared connection was not allowed, because the only
type of such connection was plain UDP. However, QUIC stream connections are
also shared since they share socket descriptor with the listen connection.
Meanwhile, it's perfectly normal to wait on such connections.
The issue manifested itself with stream write errors when the amount of data
exceeded stream buffer size or flow control. Now no error is triggered
and Stream write module is allowed to wait for buffer space to become available.
When a stream is created by client, it's often the case that nginx will send
immediate response on that stream. An example is HTTP/3 request stream, which
in most cases quickly replies with at least HTTP headers.
QUIC stream init handlers are called from a posted event. Output QUIC
frames are also sent to client from a posted event, called the push event.
If the push event is posted before the stream init event, then output produced
by stream may trigger sending an extra UDP datagram. To address this, push
event is now re-posted when a new stream init event is posted.
An example is handling 0-RTT packets. Client typically sends an init packet
coalesced with a 0-RTT packet. Previously, nginx replied with a padded CRYPTO
datagram, followed by a 1-RTT stream reply datagram. Now CRYPTO and STREAM
packets are coalesced in one reply datagram, which saves bandwidth.
Other examples include coalescing 1-RTT first stream response, and
MAX_STREAMS/STREAM sent in response to ACK/STREAM.
It now uses custom alloc_aligned() wrapper for all allocations,
therefore all allocations are larger than expected by (64 + sizeof(void*)).
Further, they are seen as allocations of 1 element. Relevant calculations
were adjusted to reflect this, and state allocation is now protected
with a flag to avoid misinterpreting other allocations as the zlib
deflate_state allocation.
Further, it no longer forces window bits to 13 on compression level 1,
so the comment was adjusted to reflect this.
When establishing a connection to the backend, nginx blocks reading
from the client with ngx_mail_proxy_block_read(). Previously, such
events were lost, and in some cases this resulted in connection hangs.
Notably, this affected mail_imap_ssl.t on Windows, since the test
closes connections after requesting authentication, but without
waiting for any responses (so the connection close events might be
lost).
Fix is to post an event to read from the client after connecting to
the backend if there were blocked events.
SSL context is not present if the default server has neither certificates nor
ssl_reject_handshake enabled. Previously, this led to null pointer dereference
before it would be caught with configuration checks.
Additionally, non-default servers with distinct SSL contexts need to initialize
compatibility layer in order to complete a QUIC handshake.
This ensures that errors which happen during logging to syslog are logged
with proper context, such as "while logging to syslog" and the server name.
Prodded by Safar Safarly.
During initial startup the ngx_cycle->hostname is not available, and
previously this resulted in incorrect logging. Instead, hostname from the
configuration being parsed is now preserved in the syslog peer structure
and then used during logging.
Similarly, ngx_cycle->log might not match the configuration where the
syslog peer is defined if the configuration is not yet fully applied,
and previously this resulted in unexpected logging of syslog errors
and debug information. Instead, cf->cycle->new_log is now referenced
in the syslog peer structure and used for logging, similarly to how it
is done in other modules.
Similarly to ticket #274 (7354:1812f1d79d84), early request finalization
without calling ngx_http_run_posted_requests() resulted in a connection
hang (a socket leak) if the 400 (Bad Request) error was generated in
ngx_http_v2_state_process_header() due to invalid request headers and
"return 444" was used in error_page 400.
As tested with tlsfuzzer with LibreSSL 3.7.0, the following errors are
certainly client-related:
SSL_do_handshake() failed (SSL: error:14026073:SSL routines:ACCEPT_SR_CLNT_HELLO:bad packet length)
SSL_do_handshake() failed (SSL: error:1402612C:SSL routines:ACCEPT_SR_CLNT_HELLO:ssl3 session id too long)
SSL_do_handshake() failed (SSL: error:140380EA:SSL routines:ACCEPT_SR_KEY_EXCH:tls rsa encrypted value length is wrong)
Accordingly, the SSL_R_BAD_PACKET_LENGTH ("bad packet length"),
SSL_R_SSL3_SESSION_ID_TOO_LONG ("ssl3 session id too long"),
SSL_R_TLS_RSA_ENCRYPTED_VALUE_LENGTH_IS_WRONG ("tls rsa encrypted value
length is wrong") errors are now logged at the "info" level.
To further differentiate client-related errors and adjust logging levels
of various SSL errors, nginx was tested with tlsfuzzer with multiple
OpenSSL versions (3.1.0-beta1, 3.0.8, 1.1.1t, 1.1.0l, 1.0.2u, 1.0.1u,
1.0.0s, 0.9.8zh).
The following errors were observed during tlsfuzzer runs with OpenSSL 3.0.8,
and are clearly client-related:
SSL_do_handshake() failed (SSL: error:0A000092:SSL routines::data length too long)
SSL_do_handshake() failed (SSL: error:0A0000A0:SSL routines::length too short)
SSL_do_handshake() failed (SSL: error:0A000124:SSL routines::bad legacy version)
SSL_do_handshake() failed (SSL: error:0A000178:SSL routines::no shared signature algorithms)
Accordingly, the SSL_R_DATA_LENGTH_TOO_LONG ("data length too long"),
SSL_R_LENGTH_TOO_SHORT ("length too short"), SSL_R_BAD_LEGACY_VERSION
("bad legacy version"), and SSL_R_NO_SHARED_SIGNATURE_ALGORITHMS
("no shared signature algorithms", misspelled as "sigature" in OpenSSL 1.0.2)
errors are now logged at the "info" level.
Additionally, the following errors were observed with OpenSSL 3.0.8 and
with TLSv1.3 enabled:
SSL_do_handshake() failed (SSL: error:0A00006F:SSL routines::bad digest length)
SSL_do_handshake() failed (SSL: error:0A000070:SSL routines::missing sigalgs extension)
SSL_do_handshake() failed (SSL: error:0A000096:SSL routines::encrypted length too long)
SSL_do_handshake() failed (SSL: error:0A00010F:SSL routines::bad length)
SSL_read() failed (SSL: error:0A00007A:SSL routines::bad key update)
SSL_read() failed (SSL: error:0A000125:SSL routines::mixed handshake and non handshake data)
Accordingly, the SSL_R_BAD_DIGEST_LENGTH ("bad digest length"),
SSL_R_MISSING_SIGALGS_EXTENSION ("missing sigalgs extension"),
SSL_R_ENCRYPTED_LENGTH_TOO_LONG ("encrypted length too long"),
SSL_R_BAD_LENGTH ("bad length"), SSL_R_BAD_KEY_UPDATE ("bad key update"),
and SSL_R_MIXED_HANDSHAKE_AND_NON_HANDSHAKE_DATA ("mixed handshake and non
handshake data") errors are now logged at the "info" level.
Additionally, the following errors were observed with OpenSSL 1.1.1t:
SSL_do_handshake() failed (SSL: error:14094091:SSL routines:ssl3_read_bytes:data between ccs and finished)
SSL_do_handshake() failed (SSL: error:14094199:SSL routines:ssl3_read_bytes:too many warn alerts)
SSL_read() failed (SSL: error:1408F0C6:SSL routines:ssl3_get_record:packet length too long)
SSL_read() failed (SSL: error:14094085:SSL routines:ssl3_read_bytes:ccs received early)
Accordingly, the SSL_R_CCS_RECEIVED_EARLY ("ccs received early"),
SSL_R_DATA_BETWEEN_CCS_AND_FINISHED ("data between ccs and finished"),
SSL_R_PACKET_LENGTH_TOO_LONG ("packet length too long"), and
SSL_R_TOO_MANY_WARN_ALERTS ("too many warn alerts") errors are now logged
at the "info" level.
Additionally, the following errors were observed with OpenSSL 1.0.2u:
SSL_do_handshake() failed (SSL: error:1407612A:SSL routines:SSL23_GET_CLIENT_HELLO:record too small)
SSL_do_handshake() failed (SSL: error:1408C09A:SSL routines:ssl3_get_finished:got a fin before a ccs)
Accordingly, the SSL_R_RECORD_TOO_SMALL ("record too small") and
SSL_R_GOT_A_FIN_BEFORE_A_CCS ("got a fin before a ccs") errors are now
logged at the "info" level.
No additional client-related errors were observed while testing with
OpenSSL 3.1.0-beta1, OpenSSL 1.1.0l, OpenSSL 1.0.1u, OpenSSL 1.0.0s,
and OpenSSL 0.9.8zh.
In some cases there might be multiple errors in the OpenSSL error queue,
notably when a libcrypto call fails, and then the SSL layer generates
an error itself. For example, the following errors were observed
with OpenSSL 3.0.8 with TLSv1.3 enabled:
SSL_do_handshake() failed (SSL: error:02800066:Diffie-Hellman routines::invalid public key error:0A000132:SSL routines::bad ecpoint)
SSL_do_handshake() failed (SSL: error:08000066:elliptic curve routines::invalid encoding error:0A000132:SSL routines::bad ecpoint)
SSL_do_handshake() failed (SSL: error:0800006B:elliptic curve routines::point is not on curve error:0A000132:SSL routines::bad ecpoint)
In such cases it seems to be better to determine logging level based on
the last error in the error queue (the one added by the SSL layer,
SSL_R_BAD_ECPOINT in all of the above example example errors). To do so,
the ngx_ssl_connection_error() function was changed to use
ERR_peek_last_error().
An UTF-8 octet sequence cannot start with a 11111xxx byte (above 0xf8),
see https://datatracker.ietf.org/doc/html/rfc3629#section-3. Previously,
such bytes were accepted by ngx_utf8_decode() and misinterpreted as 11110xxx
bytes (as in a 4-byte sequence). While unlikely, this can potentially cause
issues.
Fix is to explicitly reject such bytes in ngx_utf8_decode().
Just a drive letter might not correctly represent file system being used,
notably when using symlinks (as created by "mklink /d"). As such, instead
of trying to call GetDiskFreeSpace() with just a drive letter, we now always
use GetDiskFreeSpace() with full path.
Further, it looks like the code to use just a drive letter never worked,
since it tried to test name[2] instead of name[1] to be ':'.
This ensures that ngx_win32_rename_file() will support non-ASCII names
when supported by the wrappers.
Notably, this is used by PUT requests in the dav module when overwriting
existing files with non-ASCII names (ticket #1433).
Previously, ngx_win32_rename_file() retried on all errors returned by
MoveFile() to a temporary name. It only make sense, however, to retry
when the destination file already exists, similarly to the condition
when ngx_win32_rename_file() is called. Retrying on other errors is
meaningless and might result in an infinite loop.
This makes it possible to create directories under prefix with non-ASCII
characters, as well as makes it possible to create directories with non-ASCII
characters when using the dav module (ticket #1433).
To ensure that the dav module operations are restricted similarly to
other file operations (in particular, short names are not allowed), the
ngx_win32_check_filename() function is used. It improved to support
checking of just dirname, and now can be used to check paths when creating
files or directories.
Notably, ngx_open_dir() now supports opening directories with non-ASCII
characters, and directory entries returned by ngx_read_dir() are properly
converted to UTF-8.
To ensure proper target selection the NGX_MACHINE variable is now set
based on the MSVC compiler output, and the OpenSSL target is set based
on it.
This is not important as long as "no-asm" is used (as in misc/GNUmakefile
and win32 build instructions), but might be beneficial if someone is trying
to build OpenSSL with assembler code.
Previously, NGX_MACHINE was not set when crossbuilding, resulting in
NGX_ALIGNMENT=16 being used in 32-bit builds (if not explicitly set to a
correct value). This in turn might result in memory corruption in
ngx_palloc() (as there are no usable aligned allocator on Windows, and
normal malloc() is used instead, which provides 8 byte alignment on
32-bit platforms).
To fix this, now i386 machine is set when crossbuilding, so nginx won't
assume strict alignment requirements.
Output examples in English, Russian, and Spanish:
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86
Оптимизирующий 32-разрядный компилятор Microsoft (R) C/C++ версии 16.00.30319.01 для 80x86
Compilador de optimización de C/C++ de Microsoft (R) versión 16.00.30319.01 para x64
Since most of the words are translated, instead of looking for the words
"Compiler Version" we now search for "C/C++" and the version number.
This is expected to help with clients using pipelining with some constant
depth, such as apt[1][2].
When downloading many resources, apt uses pipelining with some constant
depth, a number of requests in flight. This essentially means that after
receiving a response it sends an additional request to the server, and
this can result in requests arriving to the server at any time. Further,
additional requests are sent one-by-one, and can be easily seen as such
(neither as pipelined, nor followed by pipelined requests).
The only safe approach to close such connections (for example, when
keepalive_requests is reached) is with lingering. To do so, now nginx
monitors if pipelining was used on the connection, and if it was, closes
the connection with lingering.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=973861#10
[2] https://mailman.nginx.org/pipermail/nginx-devel/2023-January/ZA2SP5SJU55LHEBCJMFDB2AZVELRLTHI.html
Since 4611:2b6cb7528409 responses from the gzip static, flv, and mp4 modules
can be used with subrequests, though empty files were not properly handled.
Empty gzipped, flv, and mp4 files thus resulted in "zero size buf in output"
alerts. While valid corresponding files are not expected to be empty, such
files shouldn't result in alerts.
Fix is to set b->sync on such empty subrequest responses, similarly to what
ngx_http_send_special() does.
Additionally, the static module, the ngx_http_send_response() function, and
file cache are modified to do the same instead of not sending the response
body at all in such cases, since not sending the response body at all is
believed to be at least questionable, and might break various filters
which do not expect such behaviour.
The "listen" directive in the http module can be used multiple times
in different server blocks. Originally, it was supposed to be specified
once with various socket options, and without any parameters in virtual
server blocks. For example:
server { listen 80 backlog=1024; server_name foo; ... }
server { listen 80; server_name bar; ... }
server { listen 80; server_name bazz; ... }
The address part of the syntax ("address[:port]" / "port" / "unix:path")
uniquely identifies the listening socket, and therefore is enough for
name-based virtual servers (to let nginx know that the virtual server
accepts requests on the listening socket in question).
To ensure that listening options do not conflict between virtual servers,
they were allowed only once. For example, the following configuration
will be rejected ("duplicate listen options for 0.0.0.0:80 in ..."):
server { listen 80 backlog=1024; server_name foo; ... }
server { listen 80 backlog=512; server_name bar; ... }
At some point it was, however, noticed, that it is sometimes convenient
to repeat some options for clarity. In nginx 0.8.51 the "ssl" parameter
was allowed to be specified multiple times, e.g.:
server { listen 443 ssl backlog=1024; server_name foo; ... }
server { listen 443 ssl; server_name bar; ... }
server { listen 443 ssl; server_name bazz; ... }
This approach makes configuration more readable, since SSL sockets are
immediately visible in the configuration. If this is not needed, just the
address can still be used.
Later, additional protocol-specific options similar to "ssl" were
introduced, notably "http2" and "proxy_protocol". With these options,
one can write:
server { listen 443 ssl backlog=1024; server_name foo; ... }
server { listen 443 http2; server_name bar; ... }
server { listen 443 proxy_protocol; server_name bazz; ... }
The resulting socket will use ssl, http2, and proxy_protocol, but this
is not really obvious from the configuration.
To emphasize such misleading configurations are discouraged, nginx now
warns as long as the "listen" directive is used with options different
from the options previously used if this is potentially confusing.
In particular, the following configurations are allowed:
server { listen 8401 ssl backlog=1024; server_name foo; }
server { listen 8401 ssl; server_name bar; }
server { listen 8401 ssl; server_name bazz; }
server { listen 8402 ssl http2 backlog=1024; server_name foo; }
server { listen 8402 ssl; server_name bar; }
server { listen 8402 ssl; server_name bazz; }
server { listen 8403 ssl; server_name bar; }
server { listen 8403 ssl; server_name bazz; }
server { listen 8403 ssl http2; server_name foo; }
server { listen 8404 ssl http2 backlog=1024; server_name foo; }
server { listen 8404 http2; server_name bar; }
server { listen 8404 http2; server_name bazz; }
server { listen 8405 ssl http2 backlog=1024; server_name foo; }
server { listen 8405 ssl http2; server_name bar; }
server { listen 8405 ssl http2; server_name bazz; }
server { listen 8406 ssl; server_name foo; }
server { listen 8406; server_name bar; }
server { listen 8406; server_name bazz; }
And the following configurations will generate warnings:
server { listen 8501 ssl http2 backlog=1024; server_name foo; }
server { listen 8501 http2; server_name bar; }
server { listen 8501 ssl; server_name bazz; }
server { listen 8502 backlog=1024; server_name foo; }
server { listen 8502 ssl; server_name bar; }
server { listen 8503 ssl; server_name foo; }
server { listen 8503 http2; server_name bar; }
server { listen 8504 ssl; server_name foo; }
server { listen 8504 http2; server_name bar; }
server { listen 8504 proxy_protocol; server_name bazz; }
server { listen 8505 ssl http2 proxy_protocol; server_name foo; }
server { listen 8505 ssl http2; server_name bar; }
server { listen 8505 ssl; server_name bazz; }
server { listen 8506 ssl http2; server_name foo; }
server { listen 8506 ssl; server_name bar; }
server { listen 8506; server_name bazz; }
server { listen 8507 ssl; server_name bar; }
server { listen 8507; server_name bazz; }
server { listen 8507 ssl http2; server_name foo; }
server { listen 8508 ssl; server_name bar; }
server { listen 8508; server_name bazz; }
server { listen 8508 ssl backlog=1024; server_name foo; }
server { listen 8509; server_name bazz; }
server { listen 8509 ssl; server_name bar; }
server { listen 8509 ssl backlog=1024; server_name foo; }
The basic idea is that at most two sets of protocol options are allowed:
the main one (with socket options, if any), and a shorter one, with options
being a subset of the main options, repeated for clarity. As long as the
shorter set of protocol options is used, all listen directives except the
main one should use it.
Now "listen" directve has a new "quic" parameter which enables QUIC protocol
for the address. Further, to enable HTTP/3, a new directive "http3" is
introduced. The hq-interop protocol is enabled by "http3_hq" as before.
Now application protocol is chosen by ALPN.
Previously used "http3" parameter of "listen" is deprecated.
A QUIC handshake failure breaks down into several cases:
- a handshake error which leads to a send_alert call
- an error triggered by the add_handshake_data callback
- internal errors (allocation etc)
Previously, in the first case, only error code was set in the send_alert
callback. Now the "handshake failed" reason phrase is set there as well.
In the second case, both code and reason are set by add_handshake_data.
In the last case, setting reason phrase is removed: returning NGX_ERROR
now leads to closing the connection with just INTERNAL_ERROR.
Reported by Jiuzhou Cui.
Previously, since 3550b00d9dc8, the token was allocated on stack, to get
rid of pool usage. Now the token is allocated by ngx_quic_copy_buffer()
in QUIC buffers, also used for STREAM, CRYPTO and ACK frames.
Previously, location prefix length in ngx_http_location_tree_node_t was
stored as "u_char", and therefore location prefixes longer than 255 bytes
were handled incorrectly.
Fix is to use "u_short" instead. With "u_short", prefixes up to 65535 bytes
can be safely used, and this isn't reachable due to NGX_CONF_BUFFER, which
is 4096 bytes.
In contrast to on-the-fly gzipping with gzip filter, static gzipped
representation as returned by gzip_static is persistent, and therefore
the same binary representation is available for future requests, making
it possible to use range requests.
Further, if a gzipped representation is re-generated with different
compression settings, it is expected to result in different ETag and
different size reported in the Content-Range header, making it possible
to safely use range requests anyway.
As such, ranges are now allowed for files returned by gzip_static.
In nginx source code the inttypes.h include, if available, is used to define
standard integer types. Changed the SO_COOKIE configure test to follow this.
Specifically, now it is kept unset until streams are initialized.
Notably, this unbreaks OCSP with client certificates after 35e27117b593.
Previously, the read event could be posted prematurely via ngx_quic_set_event()
e.g., as part of handling a STREAM frame.
Previously, streams were initialized in early keys handler. However, client
transport parameters may not be available by then. This happens, for example,
when using QuicTLS. Now streams are initialized in ngx_quic_crypto_input()
after calling SSL_do_handshake() for both 0-RTT and 1-RTT.
Previously, there was no timeout for a request stream blocked on insert count,
which could result in infinite wait. Now client_header_timeout is set when
stream is first blocked.
Now, when RESET_STREAM is sent or received, or when streams are closed,
stream connection error flag is set. Previously, only stream state was
changed, which resulted in setting the error flag only after calling
recv()/send()/send_chain(). However, there are cases when none of these
functions is called, but it's still important to know if the stream is being
closed. For example, when an HTTP/3 request stream is blocked on insert count,
receiving RESET_STREAM should trigger stream closure, which was not the case.
The change also fixes ngx_http_upstream_check_broken_connection() and
ngx_http_test_reading() with QUIC streams.
Previously, stream events were added and deleted by ngx_handle_read_event() and
ngx_handle_write_event() in a way similar to level-triggered events. However,
QUIC stream events are effectively edge-triggered and can stay active all time.
Moreover, the events are now active since the moment a stream is created.
Previously, start_time wasn't set for a new stream.
The fix is to derive it from the parent connection.
Also it's used to simplify tracking keepalive_time.
As per RFC 9204, section 3.2.2, a new entry can reference an entry in the
dynamic table that will be evicted when adding this new entry into the dynamic
table.
Previously, such inserts resulted in use-after-free since the old entry was
evicted before the insertion (ticket #2431). Now it's evicted after the
insertion.
This change fixes Insert with Name Reference and Duplicate encoder instructions.
Ports difference must be respected when checking addresses for duplicates,
otherwise configurations like this are broken:
listen 127.0.0.1:6000-6005
It was broken by 4cc2bfeff46c (nginx 1.23.3).
Fixed event flags handling edge cases in ngx_wsarecv() and ngx_wsarecv_chain(),
notably to always reset rev->ready in case of errors (which wasn't the case
after ngx_socket_nread() errors), and after EOF (rev->ready was not cleared
if due to a misconfiguration a zero-sized buffer was used for reading).
With this change, behaviour of ngx_ssl_recv() now matches ngx_unix_recv(),
which used to always reset c->read->ready to 0 when returning errors.
This fixes an infinite loop in unbuffered SSL proxying if writing to the
client is blocked and an SSL error happens (ticket #2418).
With this change, the fix for a similar issue in the stream module
(6868:ee3645078759), which used a different approach of explicitly
testing c->read->error instead, is no longer needed and was reverted.
Casts are believed to be not needed, since memcmp() has "const void *"
arguments since introduction of the "void" type in C89. And on pre-C89
platforms nginx is unlikely to compile without warnings anyway, as there
are no casts in memcpy() and memmove() calls.
These casts were added in 1648:89a47f19b9ec without any details on why they
were added, and Igor does not remember details either. The most plausible
explanation is that they were copied from ngx_strcmp() and were not really
needed even at that time.
Prodded by Alejandro Colomar.
Binary upgrades are not supported without master process, but it is,
however, possible, that nginx running with master process is asked
to upgrade binary, and the configuration file as available on disk
at this time includes "master_process off;".
If this happens, listening sockets inherited from the previous binary
will have ls[i].previous set. But the old cycle on initial process
startup, including startup after binary upgrade, is destroyed by
ngx_init_cycle() once configuration parsing is complete. As a result,
an attempt to dereference ls[i].previous in ngx_event_process_init()
accesses already freed memory.
Fix is to avoid looking into ls[i].previous if the old cycle is already
freed.
With this change it is also no longer needed to clear ls[i].previous in
worker processes, so the relevant code was removed.
Cloning of listening sockets for each worker process does not make sense
when working without master process, and causes some of the connections
not to be accepted if worker_processes is set to more than one and there
are listening sockets configured with the reuseport flag. Fix is to
disable cloning when master process is disabled.
Due to the glibc bug[1], getaddrinfo("localhost") with AI_ADDRCONFIG
on a typical host with glibc and without IPv6 returns two 127.0.0.1
addresses, and therefore "listen localhost:80;" used to result in
"duplicate ... address and port pair" after 4f9b72a229c1.
Fix is to explicitly filter out duplicate addresses returned during
resolution of a name.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=14969
Previously, if an event was posted by a read event handler, called by
ngx_close_idle_connections(), that event was not processed until the next
event loop iteration, which could happen after a timeout.
As the SSI parser always uses the context from the main request for storing
variables and blocks, that context should always exist for subrequests using
SSI, even though the main request does not necessarily have SSI enabled.
However, `ngx_http_get_module_ctx(r->main, ...)` is getting NULL in such cases,
resulting in the worker crashing SIGSEGV when accessing its attributes.
This patch links the first initialized context to the main request, and
upgrades it only when main context is initialized.
The check is not expected to fail unless there is a bug in the calling
code. But given the check is here, it should log an alert if it fails
instead of silently closing the connection.
Maximum size for reading the PROXY protocol header is increased to 4096 to
accommodate a bigger number of TLVs, which are supported since cca4c8a715de.
Maximum size for writing the PROXY protocol header is not changed since only
version 1 is currently supported.
Previously, keepalive timer was deleted in ngx_http_v3_wait_request_handler()
and set in request cleanup handler. This worked for HTTP/3 connections, but not
for hq connections. Now keepalive timer is deleted in
ngx_http_v3_init_request_stream() and set in connection cleanup handler,
which works both for HTTP/3 and hq.
It's called after handshake completion or prior to the first early data stream
creation. The callback should initialize application-level data before
creating streams.
HTTP/3 callback implementation sets keepalive timer and sends SETTINGS.
Also, this allows to limit max handshake time in ngx_http_v3_init_stream().
Most atoms should not appear more than once in a container. Previously,
this was not enforced by the module, which could result in worker process
crash, memory corruption and disclosure.
Now it properly detects invalid shared zone configuration with omitted size.
Previously it used to read outside of the buffer boundary.
Found with AddressSanitizer.
OpenSSL with TLSv1.3 updates the session creation time on session
resumption and keeps the session timeout unmodified, making it possible
to maintain the session forever, bypassing client certificate expiration
and revocation. To make sure session timeouts are actually used, we
now update the session creation time and reduce the session timeout
accordingly.
BoringSSL with TLSv1.3 ignores configured session timeouts and uses a
hardcoded timeout instead, 7 days. So we update session timeout to
the configured value as soon as a session is created.
Instead of syncing keys with shared memory on each ticket operation,
the code now does this only when the worker is going to change expiration
of the current key, or going to switch to a new key: that is, usually
at most once per second.
To do so without races, the code maintains 3 keys: current, previous,
and next. If a worker will switch to the next key earlier, other workers
will still be able to decrypt new tickets, since they will be encrypted
with the next key.
As long as ssl_session_cache in shared memory is configured, session ticket
keys are now automatically generated in shared memory, and rotated
periodically. This can be beneficial from forward secrecy point of view,
and also avoids increased CPU usage after configuration reloads.
This also helps BoringSSL to properly resume sessions in configurations
with multiple worker processes and no ssl_session_ticket_key directives,
as BoringSSL tries to automatically rotate session ticket keys and does
this independently in different worker processes, thus breaking session
resumption between worker processes.
Given the present typical SSL session sizes, on 32-bit platforms it is
now beneficial to store all data in a single allocation, since rbtree
node + session id + ASN1 representation of a session takes 256 bytes of
shared memory (36 + 32 + 150 = about 218 bytes plus SNI server name).
Storing all data in a single allocation is beneficial for SNI names up to
about 40 characters long and makes it possible to store about 4000 sessions
in one megabyte (instead of about 3000 sessions now). This also slightly
simplifies the code.
Session ids are not expected to be longer than 32 bytes, but this is
theoretically possible with TLSv1.3, where session ids are essentially
arbitrary and sent as session tickets. Since on 64-bit platforms we
use fixed 32-byte buffer for session ids, added an explicit length check
to make sure the buffer is large enough.
Session cache allocations might fail as long as the new session is different
in size from the one least recently used (and freed when the first allocation
fails). In particular, it might not be possible to allocate space for
sessions with client certificates, since they are noticeably bigger than
normal sessions.
To ensure such allocation failures won't clutter logs, logging level changed
to "warn", and logging is now limited to at most one warning per second.
OpenSSL tries to save TLSv1.3 sessions into session cache even when using
tickets for stateless session resumption, "because some applications just
want to know about the creation of a session". To avoid trashing session
cache with useless data, we do not save such sessions now.
The variables have prefix $proxy_protocol_tlv_ and are accessible by name
and by type. Examples are: $proxy_protocol_tlv_0x01, $proxy_protocol_tlv_alpn.
Previously, all received user input was logged. If a multi-line text was
received from client and logged, it could reduce log readability and also make
it harder to parse nginx log by scripts. The change brings to PROXY protocol
the same behavior that exists for HTTP request line in
ngx_http_log_error_handler().
SSL_sendfile() expects integer file descriptor as an argument, but nginx
uses OS file handles (HANDLE) to work with files on Windows, and passing
HANDLE instead of an integer correctly results in build failure. Since
SSL_sendfile() is not expected to work on Windows anyway, the code is now
disabled on Windows with appropriate compile-time checks.
Multiple C4306 warnings (conversion from 'type1' to 'type2' of greater size)
appear during 64-bit compilation with MSVC 2010 (and older) due to extensively
used constructs like "(void *) -1", so they were disabled.
In newer MSVC versions C4306 warnings were replaced with C4312 ones, and
these are not generated for such trivial type casts.
In 2014ed60f17f, "#if SSL_CTRL_SET_ECDH_AUTO" test was incorrectly used
instead of "#ifdef SSL_CTRL_SET_ECDH_AUTO". There is no practical
difference, since SSL_CTRL_SET_ECDH_AUTO evaluates to a non-zero numeric
value when defined, but anyway it's better to correctly test if the value
is defined.
All these events are created in context of a client connection and are deleted
when the connection is closed. Setting ev->cancelable could trigger premature
connection closure and a socket leak alert.
Now main QUIC connection for HTTP/3 always has c->idle flag set. This allows
the connection to receive worker shutdown notification. It is passed to
application level via a new conf->shutdown() callback.
The HTTP/3 shutdown callback sends GOAWAY to client and gracefully shuts down
the QUIC connection.
Previously, stream was kept alive until all its data is sent. This resulted
in disabling retransmission of final part of stream when QUIC connection
was closed right after closing stream connection.
The connection is automatically switched to this mode by transport layer when
there are no non-cancelable streams. Currently, cancelable streams are
HTTP/3 encoder/decoder/control streams.
Previously, ngx_quic_finalize_connection() closed the connection with NGX_ERROR
code, which resulted in immediate connection closure. Now the code is NGX_OK,
which provides a more graceful shutdown with a timeout.
Previously, zero was used for this purpose. However, NGX_QUIC_ERR_NO_ERROR is
zero too. As a result, NGX_QUIC_ERR_NO_ERROR was changed to
NGX_QUIC_ERR_INTERNAL_ERROR when closing a QUIC connection.
If a client packet carrying a stream data frame is not acked due to packet loss,
the stream data is retransmitted later by client. It's also possible that the
retransmitted range is bigger than before due to more stream data being
available by then. If the original data was read out by the application,
there would be no read event triggered by the retransmitted frame, even though
it contains new data.
They are not supported by MSVC till 2012.
SSL_QUIC_METHOD initialization is moved to run-time to preserve portability
among SSL library implementations, which allows to reduce its visibility.
Note using of a static storage to keep SSL_set_quic_method() reference valid.
Previously, ngx_quic_hkdf_t variables used declaration with assignment
in the middle of a function, which is not supported by MSVC 2010.
Fixing this also required to rewrite the ngx_quic_hkdf_set macro
and to switch to an explicit array size.
Previously, HTTP/3 stream connection didn't inherit the servername regex
from the main QUIC connection saved when processing SNI and using regular
expressions in server names. As a result, it didn't execute to set regex
captures when choosing the virtual server while parsing HTTP/3 headers.
The type field was added in 7999d3fbb765 at early stages of QUIC implementation
and was not initialized for default listen. Missing initialization resulted in
default listen socket creation error.
SSL_CIPHER_get_protocol_id() appeared in BoringSSL somewhere between
BORINGSSL_API_VERSION 12 and 13 for compatibility with OpenSSL 1.1.1.
It was adopted without a proper macro test, which remained unnoticed.
This justifies that such old BoringSSL API isn't widely used and its
support can be dropped.
While here, removed SSL_set_quic_use_legacy_codepoint() that became
useless after the default was flipped in BoringSSL over a year ago.
Setting QUIC methods is converted to use C99 designated initializers
for simplicity, as LibreSSL 3.6.0 has different SSL_QUIC_METHOD layout.
Additionally, only set_read_secret/set_write_secret callbacks are set.
Although they are preferred in LibreSSL over set_encryption_secrets,
better be on a safe side as LibreSSL has unexpectedly incompatible
set_encryption_secrets calling convention expressed in passing read
and write secrets split in separate calls, unlike this is documented
in old BoringSSL sources. To avoid introducing further changes for
the old API, it is simply disabled.
This function is present in QuicTLS only. After SSL_READ_EARLY_DATA_SUCCESS
became visible in LibreSSL together with experimental QUIC API, this required
to revise the conditional compilation test to use more narrow macros.
After BoringSSL aligned[1] with OpenSSL on TLS1_3_CK_* macros, and
LibreSSL uses OpenSSL naming, our own variants can be dropped now.
Compatibility is preserved with libraries that lack these macros.
Additionally, transition to SSL_CIPHER_get_id() fixes build error
with LibreSSL that doesn't implement SSL_CIPHER_get_protocol_id().
[1] https://boringssl.googlesource.com/boringssl/+/dfddbc4ded
The SSL_R_BAD_RECORD_TYPE ("bad record type") errors are reported by
OpenSSL 1.1.1 or newer when using TLSv1.3 if the client sends a record
with unknown or unexpected type. These errors are now logged at the
"info" level.
When client DATA frame header and its content come in different QUIC packets,
it may happen that only the header is processed by the first
ngx_http_v3_request_body_filter() call. In this case an empty request body
buffer is added to r->request_body->bufs, which is later reused in a
subsequent ngx_http_v3_request_body_filter() call without being removed from
the body chain. As a result, rb->request_body->bufs ends up with two copies of
the same buffer.
The fix is to avoid adding empty request body buffers to r->request_body->bufs.
When reading exactly rev->available bytes, rev->available might become 0
after FIONREAD usage introduction in efd71d49bde0. On the next call of
ngx_readv_chain() on systems with EPOLLRDHUP this resulted in return without
any actions, that is, with rev->ready set, and this in turn resulted in no
timers set in event pipe, leading to socket leaks.
Fix is to reset rev->ready in ngx_readv_chain() when returning due to
rev->available being 0 with EPOLLRDHUP, much like it is already done in
ngx_unix_recv(). This ensures that if rev->available will become 0, on
systems with EPOLLRDHUP support appropriate EPOLLRDHUP-specific handling
will happen on the next ngx_readv_chain() call.
While here, also synced ngx_readv_chain() to match ngx_unix_recv() and
reset rev->ready when returning due to rev->available being 0 with kqueue.
This is mostly cosmetic change, as rev->ready is anyway reset when
rev->available is set to 0.
Some servers might emit Content-Range header on 200 responses, and this
does not seem to contradict RFC 9110: as per RFC 9110, the Content-Range
header has no meaning for status codes other than 206 and 416. Previously
this resulted in duplicate Content-Range headers in nginx responses handled
by the range filter. Fix is to clear pre-existing headers.
Starting with OpenSSL 1.1.1, various additional errors can be reported
by OpenSSL in case of client-related issues, most notably during TLSv1.3
handshakes. In particular, SSL_R_BAD_KEY_SHARE ("bad key share"),
SSL_R_BAD_EXTENSION ("bad extension"), SSL_R_BAD_CIPHER ("bad cipher"),
SSL_R_BAD_ECPOINT ("bad ecpoint"). These are now logged at the "info"
level.
To ensure optimal use of memory, SSL contexts for proxying are now
inherited from previous levels as long as relevant proxy_ssl_* directives
are not redefined.
Further, when no proxy_ssl_* directives are redefined in a server block,
we now preserve plcf->upstream.ssl in the "http" section configuration
to inherit it to all servers.
Similar changes made in uwsgi, grpc, and stream proxy.
Similar to 70e65bf8dfd7, the change is made to ensure that the ability to
cancel resolver tasks is fully controlled by the caller. As mentioned in the
referenced commit, it is safe to make this timer cancelable because resolve
tasks can have their own timeouts that are not cancelable.
The scenario where this may become a problem is a periodic background resolve
task (not tied to a specific request or a client connection), which receives a
response with short TTL, large enough to warrant fallback to a TCP query.
With each event loop wakeup, we either have a previously set write timer
instance or schedule a new one. The non-cancelable write timer can delay or
block graceful shutdown of a worker even if the ngx_resolver_ctx_t->cancelable
flag is set by the API user, and there are no other tasks or connections.
We use the resolver API in this way to maintain the list of upstream server
addresses specified with the 'resolve' parameter, and there could be third-party
modules implementing similar logic.
When we generate the last_buf buffer for an UDP upstream recv error, it does
not contain any data from the wire. ngx_stream_write_filter attempts to forward
it anyways, which is incorrect (e.g., UDP upstream ECONNREFUSED will be
translated to an empty packet).
This happens because we mark the buffer as both 'flush' and 'last_buf', and
ngx_stream_write_filter has special handling for flush with certain types of
connections (see d127837c714f, 32b0ba4855a6). The flags are meant to be
mutually exclusive, so the fix is to ensure that flush and last_buf are not set
at the same time.
Reproduction:
stream {
upstream unreachable {
server 127.0.0.1:8880;
}
server {
listen 127.0.0.1:8998 udp;
proxy_pass unreachable;
}
}
1 0.000000000 127.0.0.1 → 127.0.0.1 UDP 47 45588 → 8998 Len=5
2 0.000166300 127.0.0.1 → 127.0.0.1 UDP 47 51149 → 8880 Len=5
3 0.000172600 127.0.0.1 → 127.0.0.1 ICMP 75 Destination unreachable (Port
unreachable)
4 0.000202400 127.0.0.1 → 127.0.0.1 UDP 42 8998 → 45588 Len=0
Fixes d127837c714f.
Both "count" and "duration" variables are 32-bit, so their product might
potentially overflow. It is used to reduce 64-bit start_time variable,
and with very large start_time this can result in incorrect seeking.
Found by Coverity (CID 1499904).
Now, if the directive is given an empty string, such configuration cancels
loading of certificates, in particular, if they would be otherwise inherited
from the previous level. This restores previous behaviour, before variables
support in certificates was introduced (3ab8e1e2f0f7).
Previously, if caching was disabled due to Expires in the past, nginx
failed to cache the response even if it was cacheable as per subsequently
parsed Cache-Control header (ticket #964).
Similarly, if caching was disabled due to Expires in the past,
"Cache-Control: no-cache" or "Cache-Control: max-age=0", caching was not
used if it was cacheable as per subsequently parsed X-Accel-Expires header.
Fix is to avoid disabling caching immediately after parsing Expires in
the past or Cache-Control, but rather set flags which are later checked by
ngx_http_upstream_process_headers() (and cleared by "Cache-Control: max-age"
and X-Accel-Expires).
Additionally, now X-Accel-Expires does not prevent parsing of cache control
extensions, notably stale-while-revalidate and stale-if-error. This
ensures that order of the X-Accel-Expires and Cache-Control headers is not
important.
Prodded by Vadim Fedorenko and Yugo Horie.
If a module adds multiple WWW-Authenticate headers (ticket #485) to the
response, linked in r->headers_out.www_authenticate, all headers are now
cleared if another module later allows access.
This change is a nop for standard modules, since the only access module which
can add multiple WWW-Authenticate headers is the auth request module, and
it is checked after other standard access modules. Though this might
affect some third party access modules.
Note that if a 3rd party module adds a single WWW-Authenticate header
and not yet modified to set the header's next pointer to NULL, attempt to
clear such a header with this change will result in a segmentation fault.
When using auth_request with an upstream server which returns 401
(Unauthorized), multiple WWW-Authenticate headers from the upstream server
response are now properly copied to the response.
When using proxy_intercept_errors and an error page for error 401
(Unauthorized), multiple WWW-Authenticate headers from the upstream server
response are now properly copied to the response.
Most of the known duplicate upstream response headers are now ignored
with a warning.
If syntax permits multiple headers, these are now properly linked to
the lists, notably Vary and WWW-Authenticate. This makes it possible
to further handle such lists where it makes sense.
With this change, duplicate Content-Length and Transfer-Encoding headers
are now rejected. Further, responses with invalid Content-Length or
Transfer-Encoding headers are now rejected, as well as responses with both
Content-Length and Transfer-Encoding.
The h->next pointer properly provided as NULL in all cases where known
output headers are added.
Note that there are 3rd party modules which might not do this, and it
might be risky to rely on this for arbitrary headers.
Since introduction of offset handling in ngx_http_upstream_copy_header_line()
in revision 573:58475592100c, the ngx_http_upstream_copy_content_encoding()
function is no longer needed, as its behaviour is exactly equivalent to
ngx_http_upstream_copy_header_line() with appropriate offset. As such,
the ngx_http_upstream_copy_content_encoding() function was removed.
Further, the u->headers_in.content_encoding field is not used anywhere,
so it was removed as well.
Further, Content-Encoding handling no longer depends on NGX_HTTP_GZIP,
as it can be used even without any gzip handling compiled in (for example,
in the charset filter).
As all known input headers are now linked lists, these are now handled
identically. In particular, this makes it possible to access properly
combined values of headers not specifically handled previously, such
as "Via" or "Connection".
The ngx_http_process_multi_header_lines() function is removed, as it is
exactly equivalent to ngx_http_process_header_line(). Similarly,
ngx_http_variable_header() is used instead of ngx_http_variable_headers().
Multi headers are now using linked lists instead of arrays. Notably,
the following fields were changed: r->headers_in.cookies (renamed
to r->headers_in.cookie), r->headers_in.x_forwarded_for,
r->headers_out.cache_control, r->headers_out.link, u->headers_in.cache_control
u->headers_in.cookies (renamed to u->headers_in.set_cookie).
The r->headers_in.cookies and u->headers_in.cookies fields were renamed
to r->headers_in.cookie and u->headers_in.set_cookie to match header names.
The ngx_http_parse_multi_header_lines() and ngx_http_parse_set_cookie_lines()
functions were changed accordingly.
With this change, multi headers are now essentially equivalent to normal
headers, and following changes will further make them equivalent.
Previously, $http_*, $sent_http_*, $sent_trailer_*, $upstream_http_*,
and $upstream_trailer_* variables returned only the first header (with
a few specially handled exceptions: $http_cookie, $http_x_forwarded_for,
$sent_http_cache_control, $sent_http_link).
With this change, all headers are returned, combined together. For
example, $http_foo variable will be "a, b" if there are "Foo: a" and
"Foo: b" headers in the request.
Note that $upstream_http_set_cookie will also return all "Set-Cookie"
headers (ticket #1843), though this might not be what one want, since
the "Set-Cookie" header does not follow the list syntax (see RFC 7230,
section 3.2.2).
The uwsgi specification states that "The uwsgi block vars represent a
dictionary/hash". This implies that no duplicate headers are expected.
Further, provided headers are expected to follow CGI specification,
which also requires to combine headers (RFC 3875, section "4.1.18.
Protocol-Specific Meta-Variables"): "If multiple header fields with
the same field-name are received then the server MUST rewrite them
as a single value having the same semantics".
SCGI specification explicitly forbids headers with duplicate names
(section "3. Request Format"): "Duplicate names are not allowed in
the headers".
Further, provided headers are expected to follow CGI specification,
which also requires to combine headers (RFC 3875, section "4.1.18.
Protocol-Specific Meta-Variables"): "If multiple header fields with
the same field-name are received then the server MUST rewrite them
as a single value having the same semantics".
FastCGI responder is expected to receive CGI/1.1 environment variables
in the parameters (see section "6.2 Responder" of the FastCGI specification).
Obviously enough, there cannot be multiple environment variables with
the same name.
Further, CGI specification (RFC 3875, section "4.1.18. Protocol-Specific
Meta-Variables") explicitly requires to combine headers: "If multiple
header fields with the same field-name are received then the server MUST
rewrite them as a single value having the same semantics".
Previously, the r->header_in->connection pointer was never set despite
being present in ngx_http_headers_in, resulting in incorrect value returned
by $r->header_in("Connection") in embedded perl.
In 7583:efd71d49bde0 (nginx 1.17.5) along with introduction of the
ioctl(FIONREAD) support proper handling of systems without EPOLLRDHUP
support in the kernel (but with EPOLLRDHUP in headers) was broken.
Before the change, rev->available was never set to 0 unless
ngx_use_epoll_rdhup was also set (that is, runtime test for EPOLLRDHUP
introduced in 6536:f7849bfb6d21 succeeded). After the change,
rev->available might reach 0 on systems without runtime EPOLLRDHUP
support, stopping further reading in ngx_readv_chain() and ngx_unix_recv().
And, if EOF happened to be already reported along with the last event,
it is not reported again by epoll_wait(), leading to connection hangs
and timeouts on such systems.
This affects Linux kernels before 2.6.17 if nginx was compiled
with newer headers, and, more importantly, emulation layers, such as
DigitalOcean's App Platform's / gVisor's epoll emulation layer.
Fix is to explicitly check ngx_use_epoll_rdhup before the corresponding
rev->pending_eof tests in ngx_readv_chain() and ngx_unix_recv().
Previously, QUIC used the existing UDP framework, which was created for UDP in
Stream. However the way QUIC connections are created and looked up is different
from the way UDP connections in Stream are created and looked up. Now these
two implementations are decoupled.
Previously, last buffer was tracked by keeping a pointer to the previous
chain link "next" field. When the previous buffer was split and then removed,
the pointer was no longer valid. Writing at this pointer resulted in broken
data chains.
Now last buffer is tracked by keeping a direct pointer to it.
The object is used instead of ngx_chain_t pointer for buffer operations like
ngx_quic_write_chain() and ngx_quic_read_chain(). These functions are renamed
to ngx_quic_write_buffer() and ngx_quic_read_buffer().
Such fatal errors are reported by OpenSSL 1.1.1, and similarly by BoringSSL,
if application data is encountered during SSL shutdown, which started to be
observed on the second SSL_shutdown() call after SSL shutdown fixes made in
09fb2135a589 (1.19.2). The error means that the client continues to send
application data after receiving the "close_notify" alert (ticket #2318).
Previously it was reported as SSL_shutdown() error of SSL_ERROR_SYSCALL.
Now ngx_quic_stream_t is decoupled from ngx_connection_t in a way that it
can persist after connection is closed by application. During this period,
server is expecting stream final size from client for correct flow control.
Also, buffered output is sent to client as more flow control credit is granted.
As shown in RFC 8446, section 2.2, Figure 3, and further specified in
section 4.6.1, BoringSSL releases session tickets in Application Data
(along with Finished) early, based on a precalculated client Finished
transcript, once client signalled early data in extensions.
Initially, frames are genereated and stored in ctx->frames.
Next, ngx_quic_output() collects frames to be sent in in ctx->sending.
On failure, ngx_quic_revert_sned() returns frames into ctx->frames.
On success, the ngx_quic_commit_send() moves ack-eliciting frames into
ctx->sent and frees non-ack-eliciting frames.
This function also updates in-flight bytes counter, so only actually sent
frames are accounted.
The counter is decremented in the following cases:
- acknowledgment is received
- packet was declared lost
- we are discarding context completely
In each of this cases frame is removed from ctx->sent queue and in-flight
counter is accordingly decremented.
The patch fixes the case of discarding context - only removing frames
from ctx->sent must be followed by in-flight bytes counter decrement,
otherwise cg->in_flight could experience type underflow.
The issue appeared in b1676cd64dc9.
The cd8018bc81a5 fixed unintended send of non-padded initial packets,
but failed to restore context properly: only processed contexts need
to be restored. As a consequence, a packet number could be restored
from uninitialized value.
Previously, the flag could be reset after send_chain() with a limit, even
though there was room for more data. The application then started waiting for
a write event notification, which never happened.
Now the wev->ready flag is only reset when flow control is exhausted.
With large http2_max_concurrent_streams or http2_max_concurrent_pushes, more
than 255 ngx_http_v2_node_t structures might be allocated, eventually leading
to h2c->closed_nodes overflow when closing corresponding streams. This will
in turn result in additional allocations in ngx_http_v2_get_node_by_id().
While mostly harmless, it can result in excessive memory usage by a HTTP/2
connection, notably in configurations with many keepalive_requests allowed.
Fix is to use ngx_uint_t for h2c->closed_nodes instead of unsigned:8.
The switch happens when received byte counter reaches stream final size.
Previously, this state was skipped. The stream went from SIZE_KNOWN to
DATA_READ when all bytes were read by application.
The change prevents STOP_SENDING frames from being sent when all data is
received from client, but not yet fully read by application.
Previously, size was calculated based on the number of input bytes processed
by the function. Now only the copied bytes are considered. This prevents
overlapping buffers from contributing twice to the overall written size.
Response headers can be buffered in the SSL buffer. But stream's fake
connection buffered flag did not reflect this, so any attempts to flush
the buffer without sending additional data were stopped by the write filter.
It does not seem to be possible to reflect this in fc->buffered though, as
we never known if main connection's c->buffered corresponds to the particular
stream or not. As such, fc->buffered might prevent request finalization
due to sending data on some other stream.
Fix is to implement handling of flush buffers when the c->need_flush_buf
flag is set, similarly to the existing last buffer handling. The same
flag is now used for UDP sockets in the stream module instead of explicit
checking of c->type.
Previously, non-padded initial packet could be sent as a result of the
following situation:
- initial queue is not empty (so padding to 1200 is required)
- handshake queue is not empty (so padding is to be added after h/s packet)
- path is limited
If serializing handshake packet would violate path limit, such packet was
omitted, and the non-padded initial packet was sent.
The fix is to avoid sending the packet at all in such case. This follows the
original intention introduced in c5155a0cb12f.
During configuration reload two cache managers might exist for a short
time. If both tried to delete the same cache node, the "ignore long locked
inactive cache entry" alert appeared in logs. Additionally,
ngx_http_file_cache_forced_expire() might be also called by worker
processes, with similar results.
Fix is to ignore cache nodes being deleted, similarly to how it is
done in ngx_http_file_cache_expire() since 3755:76e3a93821b1. This
was somehow missed in 7002:ab199f0eb8e8, when ignoring long locked
cache entries was introduced in ngx_http_file_cache_forced_expire().
- wording in log->action is adjusted to match function names.
- connection close steps are made obvious and start with "quic close" prefix:
*1 quic close initiated rc:-4
*1 quic close silent drain:0 timedout:1
*1 quic close resumed rc:-1
*1 quic close resumed rc:-1
*1 quic close resumed rc:-4
*1 quic close completed
this makes it easy to understand if particular "close" record is an initial
cause or lasting process, or the final one.
- cases of close without quic connection now logged as "packet rejected":
*14 quic run
*14 quic packet rx long flags:ec version:1
*14 quic packet rx hs len:61
*14 quic packet rx dcid len:20 00000000000002c32f60e4aa2b90a64a39dc4228
*14 quic packet rx scid len:8 81190308612cd019
*14 quic expected initial, got handshake
*14 quic packet done rc:-1 level:hs decr:0 pn:0 perr:0
*14 quic packet rejected rc:-1, cleanup connection
*14 reusable connection: 0
this makes it easy to spot early packet rejection and avoid confuse with
quic connection closing (which in fact was not even created).
- packet processing summary now uses same prefix "quic packet done rc:"
- added debug to places where packet was rejected without any reason logged
The NGX_DECLINED is replaced with NGX_DONE to match closer to return code
of ngx_quic_handle_packet() and ngx_quic_close_connection() rc argument.
The ngx_quic_close_connection() rc code is used only when quic connection
exists, thus anything goes if qc == NULL.
The ngx_quic_handle_datagram() does not return NG_OK in cases when quic
connection is not yet created.
Previously, closure detection for server-initiated uni streams was not properly
implemented. Instead, HTTP/3 code relied on QUIC code posting the read event
and setting rev->error when it needed to close the stream. Then, regular
uni stream read handler called c->recv() and received error, which closed the
stream. This was an ad-hoc solution. If, for whatever reason, the read
handler was called earlier, c->recv() would return 0, which would also close
the stream.
Now server-initiated uni streams have a separate read event handler for
tracking stream closure. The handler calls c->recv(), which normally returns
0, but may return error in case of closure.
Sending the instruction is delayed until the end of the current event cycle.
Delaying the instruction is allowed by quic-qpack-21, section 2.2.2.3.
The goal is to reduce the amount of data sent back to client by accumulating
several inserts in one instruction and sometimes not sending the instruction at
all, if Section Acknowledgement was sent just before it.
Operations like ngx_quic_open_stream(), ngx_http_quic_get_connection(),
ngx_http_v3_finalize_connection(), ngx_http_v3_shutdown_connection() used to
receive a QUIC stream connection. Now they can receive the main QUIC
connection as well. This is useful when calling them from a stream context.
This is to ease transition with oldish BoringSSL versions,
the default for SSL_set_quic_use_legacy_codepoint() has been
flipped in BoringSSL a1d3bfb64fd7ef2cb178b5b515522ffd75d7b8c5.
Chrome only uses TLS session tickets once with TLS 1.3, likely following
RFC 8446 Appendix C.4 recommendation. With OpenSSL, this works fine with
built-in session tickets, since these are explicitly renewed in case of
TLS 1.3 on each session reuse, but results in only two connections being
reused after an initial handshake when using ssl_session_ticket_key.
Fix is to always renew TLS session tickets in case of TLS 1.3 when using
ssl_session_ticket_key, similarly to how it is done by OpenSSL internally.
Line continuation as used in the syntax file might be broken if "compatible"
is set or "C" is added to cpoptions. Fix is to set the "cpoptions" option
to vim default value at script start and restore it later, see
":help use-cpo-save".
Previously, when input ended on a QUIC buffer boundary, input chain was not
advanced to the next buffer. As a result, ngx_quic_write_chain() returned
a chain with an empty buffer instead of NULL. This broke HTTP write filter,
preventing it from closing the HTTP request and eventually timing out.
Now input chain is always advanced to a buffer that has data, before checking
QUIC buffer boundary condition.
RFC 9000, 9.3. Responding to Connection Migration:
An endpoint only changes the address to which it sends packets in
response to the highest-numbered non-probing packet.
The patch extends this requirement to probing packets. Although it may
seem excessive, it helps with mitigation of reply attacks (when an off-path
attacker has copied packet with PATH_CHALLENGE and uses different
addresses to exhaust available connection ids).
The quic connection now holds active, backup and probe paths instead
of sockets. The number of migration paths is now limited and cannot
be inflated by a bad client or an attacker.
The client id is now associated with path rather than socket. This allows
to simplify processing of output and connection ids handling.
New migration abandons any previously started migrations. This allows to
free consumed client ids and request new for use in future migrations and
make progress in case when connection id limit is hit during migration.
A path now can be revalidated without losing its state.
The patch also fixes various issues with NAT rebinding case handling:
- paths are now validated (previously, there was no validation
and paths were left in limited state)
- attempt to reuse id on different path is now again verified
(this was broken in 40445fc7c403)
- former path is now validated in case of apparent migration
The function splits a buffer at given offset. The function is now
called from ngx_quic_read_chain() and ngx_quic_write_chain(), which
simplifies both functions.
After 9ae239d2547d, ngx_quic_handle_write_event() no longer runs into
ngx_send_lowat() for QUIC connections, so the check became excessive.
It is assumed that external modules operating with SO_SNDLOWAT
(I'm not aware of any) should do this check on their own.
Previously, ngx_quic_write_chain() treated each input buffer as a memory
buffer, which is not always the case. Special buffers were not skipped, which
is especially important when hitting the input byte limit.
The issue manifested itself with ngx_quic_write_chain() returning a non-empty
chain consisting of a special last_buf buffer when called from QUIC stream
send_chain(). In order for this to happen, input byte limit should be equal to
the chain length, and the input chain should end with an empty last_buf buffer.
An easy way to achieve this is the following:
location /empty {
return 200;
}
When this non-empty chain was returned from send_chain(), it signalled to the
caller that input was blocked, while in fact it wasn't. This prevented HTTP
request from finalization, which prevented QUIC from sending STREAM FIN to
the client. The QUIC stream was then reset after a timeout.
Now special buffers are skipped and send_chain() returns NULL in the case
above, which signals to the caller a successful operation.
Also, original byte limit is now passed to ngx_quic_write_chain() from
send_chain() instead of actual chain length to make sure it's never zero.
Previously, when a STREAM FIN frame with no data bytes was received after all
prior stream data were already read by the application layer, the frame was
ignored and eof was not reported to the application.
When a worker process is shutting down, keepalive is not used: this is checked
before the ngx_http_set_keepalive() call in ngx_http_finalize_connection().
Yet the "Connection: keep-alive" header was still sent, even if we know that
the worker process is shutting down, potentially resulting in additional
requests being sent to the connection which is going to be closed anyway.
While clients are expected to be able to handle asynchronous close events
(see ticket #1022), it is certainly possible to send the "Connection: close"
header instead, informing the client that the connection is going to be closed
and potentially saving some unneeded work.
With this change, we additionally check for worker process shutdown just
before sending response headers, and disable keepalive accordingly.
Linux with EPOLLEXCLUSIVE usually notifies only the process which was first
to add the listening socket to the epoll instance. As a result most of the
connections are handled by the first worker process (ticket #2285). To fix
this, we re-add the socket periodically, so other workers will get a chance
to accept connections.
The SF_NOCACHE flag, introduced in FreeBSD 11 along with the new non-blocking
sendfile() implementation by glebius@, makes it possible to use sendfile()
along with the "directio" directive.
Starting with FreeBSD 11, there is no need to use AIO operations to preload
data into cache for sendfile(SF_NODISKIO) to work. Instead, sendfile()
handles non-blocking loading data from disk by itself. It still can, however,
return EBUSY if a page is already being loaded (for example, by a different
process). If this happens, we now post an event for the next event loop
iteration, so sendfile() is retried "after a short period", as manpage
recommends.
The limit of the number of EBUSY tolerated without any progress is preserved,
but now it does not result in an alert, since on an idle system event loop
iteration might be very short and EBUSY can happen many times in a row.
Instead, SF_NODISKIO is simply disabled for one call once the limit is
reached.
With this change, sendfile(SF_NODISKIO) is now used automatically as long as
sendfile() is enabled, and no longer requires "aio on;".
It was mostly copy of the ngx_quic_listen(). Now ngx_quic_listen() no
longer generates server id and increments seqnum. Instead, the server
id is generated when the socket is created.
The ngx_quic_alloc_socket() function is renamed to ngx_quic_create_socket().
Notably, NAXSI is known to misuse ngx_regex_compile() with rc.options set
to PCRE_CASELESS | PCRE_MULTILINE. With PCRE2 support, and notably binary
compatibility changes, it is no longer possible to set PCRE[2]_MULTILINE
option without using proper interface. To facilitate correct usage,
this change adds the NGX_REGEX_MULTILINE option.
With this change, dynamic modules using nginx regex interface can be used
regardless of the variant of the PCRE library nginx was compiled with.
If a module is compiled with different PCRE library variant, in case of
ngx_regex_exec() errors it will report wrong function name in error
messages. This is believed to be tolerable, given that fixing this will
require interface changes.
The PCRE2 library is now used by default if found, instead of the
original PCRE library. If needed for some reason, this can be disabled
with the --without-pcre2 configure option.
To make it possible to specify paths to the library and include files
via --with-cc-opt / --with-ld-opt, the library is first tested without
any additional paths and options. If this fails, the pcre2-config script
is used.
Similarly to the original PCRE library, it is now possible to build PCRE2
from sources with nginx configure, by using the --with-pcre= option.
It automatically detects if PCRE or PCRE2 sources are provided.
Note that compiling PCRE2 10.33 and later requires inttypes.h. When
compiling on Windows with MSVC, inttypes.h is only available starting
with MSVC 2013. In older versions some replacement needs to be provided
("echo '#include <stdint.h>' > pcre2-10.xx/src/inttypes.h" is good enough
for MSVC 2010).
The interface on nginx side remains unchanged.
If a configuration parsing fails for some reason, ngx_regex_module_init()
is not called, and ngx_pcre_studies remained set despite the fact that
the pool it was allocated from is already freed. This might result in
a segmentation fault during runtime regular expression compilation, such
as in SSI, for example, in the single process mode, or if a worker process
died and was respawned from a master process in such an inconsistent state.
Fix is to clear ngx_pcre_studies from the pool cleanup handler (which is
anyway used to free JIT-compiled patterns).
Previously, buffer lists was used to track used buffers. Now reference
counter is used instead. The new implementation is simpler and faster with
many buffer clones.
They are replaced with ngx_quic_write_chain() and ngx_quic_read_chain().
These functions represent the API to data buffering.
The first function adds data of given size at given offset to the buffer.
Now it returns the unwritten part of the chain similar to c->send_chain().
The second function returns data of given size from the beginning of the buffer.
Its second argument and return value are swapped compared to
ngx_quic_split_bufs() to better match ngx_quic_write_chain().
Added, returned and stored data are regular ngx_chain_t/ngx_buf_t chains.
Missing data is marked with b->sync flag.
The functions are now used in both send and recv data chains in QUIC streams.
Previously, when a few bytes were send to a QUIC stream by the application, a
4K buffer was allocated for these bytes. Then a STREAM frame was created and
that entire buffer was used as data for that frame. The frame with the buffer
were in use up until the frame was acked by client. Meanwhile, when more
bytes were send to the stream, more buffers were allocated and assigned as
data to newer STREAM frames. In this scenario most buffer memory is unused.
Now the unused part of the stream output buffer is available for further
stream output while earlier parts of the buffer are waiting to be acked.
This is achieved by splitting the output buffer.
The output is always sent to the active path, which is stored in the
quic connection. There is no need to pass it in arguments.
When output has to be send to to a specific path (in rare cases, such as
path probing), a separate method exists (ngx_quic_frame_sendto()).
The path validation status and anti-amplification limit status is actually
two different variables. It is possible that validating path should not
be limited (for example, when re-validating former path).
Previously, path was considered valid during arbitrary selected 10m timeout
since validation. This is quite not what RFC 9000 says; the relevant
part is:
An endpoint MAY skip validation of a peer address if that
address has been seen recently.
The patch considers a path to be 'recently seen' if packets were received
during idle timeout. If a packet is received from the path that was seen
not so recently, such path is considered new, and anti-amplification
restrictions apply.
After creation, a client stream is added to qc->streams.uninitialized queue.
After initialization it's removed from the queue. If a stream is never
initialized, it is freed in ngx_quic_close_streams(). Stream initializer
is now set as read event handler in stream connection.
Previously qc->streams.uninitialized was used only for delayed stream
initialization.
The change makes it possible not to handle separately the case of a new stream
in stream-related frame handlers. It makes these handlers simpler since new
streams and existing streams are now handled by the same code.
With sendfile() in threads ("aio threads; sendfile on;"), client connection
can block on writing, waiting for sendfile() to complete. In HTTP/2 this
might result in the request hang, since an attempt to continue processing
in thread event handler will call request's write event handler, which
is usually stopped by ngx_http_v2_send_chain(): it does nothing if there
are no additional data and stream->queued is set. Further, HTTP/2 resets
stream's c->write->ready to 0 if writing blocks, so just fixing
ngx_http_v2_send_chain() is not enough.
Can be reproduced with test suite on Linux with:
TEST_NGINX_GLOBALS_HTTP="aio threads; sendfile on;" prove h2*.t
The following tests currently fail: h2_keepalive.t, h2_priority.t,
h2_proxy_max_temp_file_size.t, h2.t, h2_trailers.t.
Similarly, sendfile() with AIO preloading on FreeBSD can block as well,
with similar results. This is, however, harder to reproduce, especially
on modern FreeBSD systems, since sendfile() usually does not return EBUSY.
Fix is to modify ngx_http_v2_send_chain() so it actually tries to send
data to the main connection when called, and to make sure that
c->write->ready is set by the relevant event handlers.
With sendfile in threads, "task already active" alerts might appear in logs
if a write event happens on the main HTTP/2 connection, triggering a sendfile
in threads while another thread operation is already running. Observed
with "aio threads; aio_write on; sendfile on;" and with thread event handlers
modified to post a write event to the main HTTP/2 connection (though can
happen without any modifications).
Similarly, sendfile() with AIO preloading on FreeBSD can trigger duplicate
aio operation, resulting in "second aio post" alerts. This is, however,
harder to reproduce, especially on modern FreeBSD systems, since sendfile()
usually does not return EBUSY.
Fix is to avoid starting a sendfile operation if other thread operation
is active by checking r->aio in the thread handler (and, similarly, in
aio preload handler). The added check also makes duplicate calls protection
redundant, so it is removed.
The SSL_OP_ENABLE_MIDDLEBOX_COMPAT option is provided by QuicTLS and enabled
by default in the newly created SSL contexts. SSL_set_quic_method() is used
to clear it, which is required for SSL handshake to work on QUIC connections.
Switching context in the ngx_http_ssl_servername() SNI callback overrides SSL
options from the new SSL context. This results in the option set again.
Fix is to explicitly clear it when switching to another SSL context.
Initially reported here (in Russian):
http://mailman.nginx.org/pipermail/nginx-ru/2021-November/063989.html
ngx_http_v3_tables.h and ngx_http_v3_tables.c are renamed to
ngx_http_v3_table.h and ngx_http_v3_table.c to better match HTTP/2 code.
ngx_http_v3_streams.h and ngx_http_v3_streams.c are renamed to
ngx_http_v3_uni.h and ngx_http_v3_uni.c to better match their content.
Directives that set transport parameters are removed from the configuration.
Corresponding values are derived from the quic configuration or initialized
to default. Whenever possible, quic configuration parameters are taken from
higher-level protocol settings, i.e. HTTP/3.
A new variable $http3 is added. The variable equals to "h3" for HTTP/3
connections, "hq" for hq connections and is an empty string otherwise.
The variable $quic is eliminated.
The new variable is similar to $http2 variable.
RFC 9000 19.16
The sequence number specified in a RETIRE_CONNECTION_ID frame MUST NOT
refer to the Destination Connection ID field of the packet in which the
frame is contained.
Before the patch, the RETIRE_CONNECTION_ID frame was sent before switching
to the new client id. If retired client id was currently in use, this lead
to violation of the spec.
The c->udp->dgram may be NULL only if the quic connection was just
created: the ngx_event_udp_recvmsg() passes information about datagrams
to existing connections by providing information in c->udp.
If case of a new connection, c->udp is allocated by the QUIC code during
creation of quic connection (it uses c->sockaddr to initialize qsock->path).
Thus the check for qsock->path is excessive and can be read wrong, assuming
that other options possible, leading to warnings from clang static analyzer.
Removed sending CLOSE_CONNECTION directly to avoid duplicate frames,
since it is sent later again in SSL_do_handshake() error handling.
As such, removed redundant settings of error fields set elsewhere.
While here, improved debug message.
All open sockets are stored in a queue. There is no need to close some
of them separately. If it happens that active and backup point to same
socket, double close may happen (leading to possible segfault).
The RFC 9000 allows a packet from known CID arrive from unknown path:
These requirements regarding connection ID reuse apply only to the
sending of packets, as unintentional changes in path without a change
in connection ID are possible. For example, after a period of
network inactivity, NAT rebinding might cause packets to be sent on a
new path when the client resumes sending.
Before the patch, such packets were rejected with an error in the
ngx_quic_check_migration() function. Removing the check makes the
separate function excessive - remaining checks are early migration
check and "disable_active_migration" check. The latter is a transport
parameter sent to client and it should not be used by server.
The server should send "disable_active_migration" "if the endpoint does
not support active connection migration" (18.2). The support status depends
on nginx configuration: to have migration working with multiple workers,
you need bpf helper, available on recent Linux systems. The patch does
not set "disable_active_migration" automatically and leaves it for the
administrator. By default, active migration is enabled.
RFC 900 says that it is ok to migrate if the peer violates
"disable_active_migration" flag requirements:
If the peer violates this requirement,
the endpoint MUST either drop the incoming packets on that path without
generating a Stateless Reset
OR
proceed with path validation and allow the peer to migrate. Generating a
Stateless Reset or closing the connection would allow third parties in the
network to cause connections to close by spoofing or otherwise manipulating
observed traffic.
So, nginx adheres to the second option and proceeds to path validation.
Note:
The ngtcp2 may be used for testing both active migration and NAT rebinding:
ngtcp2/client --change-local-addr=200ms --delay-stream=500ms <ip> <port> <url>
ngtcp2/client --change-local-addr=200ms --delay-stream=500ms --nat-rebinding \
<ip> <port> <url>
Single UDP datagram may contain multiple QUIC datagrams. In order to
facilitate handling of such cases, 'first' flag in the ngx_quic_header_t
structure is introduced.
Previously, the retired socket was not closed if it didn't match
active or backup.
New sockets could not be created (due to count limit), since retired socket
was not closed before calling ngx_quic_create_sockets().
When replacing retired socket, new socket is only requested after closing
old one, to avoid hitting the limit on the number of active connection ids.
Together with added restrictions, this fixes an issue when a current socket
could be closed during migration, recreated and erroneously reused leading
to null pointer dereference.
Previously the frame was not handled and connection was closed with an error.
Now, after receiving this frame, global flow control is updated and new
flow control credit is sent to client.
Previously, after receiving STREAM_DATA_BLOCKED, current flow control limit
was sent to client. Now, if the limit can be updated to the full window size,
it is updated and the new value is sent to client, otherwise nothing is sent.
The change lets client update flow control credit on demand. Also, it saves
traffic by not sending MAX_STREAM_DATA with the same value twice.
The reasons why a stream may not be created by server currently include hitting
worker_connections limit and memory allocation error. Previously in these
cases the entire QUIC connection was closed and all its streams were shut down.
Now the new stream is rejected and existing streams continue working.
To reject an HTTP/3 request stream, RESET_STREAM and STOP_SENDING with
H3_REQUEST_REJECTED error code are sent to client. HTTP/3 uni streams and
Stream streams are not rejected.
The variable contains a negotiated curve used for the handshake key
exchange process. Known curves are listed by their names, unknown
ones are shown in hex.
Note that for resumed sessions in TLSv1.2 and older protocols,
$ssl_curve contains the curve used during the initial handshake,
while in TLSv1.3 it contains the curve used during the session
resumption (see the SSL_get_negotiated_group manual page for
details).
The variable is only meaningful when using OpenSSL 3.0 and above.
With older versions the variable is empty.
Without this change, aio used with HTTP/2 can result in connection hang,
as observed with "aio threads; aio_write on;" and proxying (ticket #2248).
The problem is that HTTP/2 updates buffers outside of the output filters
(notably, marks them as sent), and then posts a write event to call
output filters. If a filter does not call the next one for some reason
(for example, because of an AIO operation in progress), this might
result in a state when the owner of a buffer already called
ngx_chain_update_chains() and can reuse the buffer, while the same buffer
is still sitting in the busy chain of some other filter.
In the particular case a buffer was sitting in output chain's ctx->busy,
and was reused by event pipe. Output chain's ctx->busy was permanently
blocked by it, and this resulted in connection hang.
Fix is to change ngx_chain_update_chains() to skip buffers from other
modules unconditionally, without trying to wait for these buffers to
become empty.
The "sendfile_max_chunk" directive is important to prevent worker
monopolization by fast connections. The 2m value implies maximum 200ms
delay with 100 Mbps links, 20ms delay with 1 Gbps links, and 2ms on
10 Gbps links. It also seems to be a good value for disks.
Previously, connections to upstream servers used sendfile() if it was
enabled, but never honored sendfile_max_chunk. This might result
in worker monopolization for a long time if large request bodies
are allowed.
On Linux starting with 2.6.16, sendfile() silently limits all operations
to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly
triggered the interrupt check, and resulted in 0-sized writev() on the
next loop iteration.
Fix is to make sure the limit is always checked, so we will return from
the loop if the limit is already reached even if number of bytes sent is
not exactly equal to the number of bytes we've tried to send.
Previously, it was checked that sendfile_max_chunk was enabled and
almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
delaying connections where sendfile_max_chunk wasn't reached (for example,
when sending responses smaller than sendfile_max_chunk). Now we instead
check if there are unsent data, and the connection is still ready for writing.
Additionally we also check c->write->delayed to ignore connections already
delayed by limit_rate.
This approach is believed to be more robust, and correctly handles
not only sendfile_max_chunk, but also internal limits of c->send_chain(),
such as sendfile() maximum supported length (ticket #1870).
Previously, 1 millisecond delay was used instead. In certain edge cases
this might result in noticeable performance degradation though, notably on
Linux with typical CONFIG_HZ=250 (so 1ms delay becomes 4ms),
sendfile_max_chunk 2m, and link speed above 2.5 Gbps.
Using posted next events removes the artificial delay and makes processing
fast in all cases.
The directive enables including all frames from start time to the most recent
key frame in the result. Those frames are removed from presentation timeline
using mp4 edit lists.
Edit lists are currently supported by popular players and browsers such as
Chrome, Safari, QuickTime and ffmpeg. Among those not supporting them properly
is Firefox[1].
Based on a patch by Tracey Jaquith, Internet Archive.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1735300
The function updates the duration field of mdhd atom. Previously it was
updated in ngx_http_mp4_read_mdhd_atom(). The change makes it possible to
alter track duration as a result of processing track frames.
As per quic-qpack-21:
When a stream is reset or reading is abandoned, the decoder emits a
Stream Cancellation instruction.
Previously the instruction was not sent. Now it's sent when closing QUIC
stream connection if dynamic table capacity is non-zero and eof was not
received from client. The latter condition means that a trailers section
may still be on its way from client and the stream needs to be cancelled.
A QUIC stream connection is treated as reusable until first bytes of request
arrive, which is also when the request object is now allocated. A connection
closed as a result of draining, is reset with the error code
H3_REQUEST_REJECTED. Such behavior is allowed by quic-http-34:
Once a request stream has been opened, the request MAY be cancelled
by either endpoint. Clients cancel requests if the response is no
longer of interest; servers cancel requests if they are unable to or
choose not to respond.
When the server cancels a request without performing any application
processing, the request is considered "rejected." The server SHOULD
abort its response stream with the error code H3_REQUEST_REJECTED.
The client can treat requests rejected by the server as though they had
never been sent at all, thereby allowing them to be retried later.
When an HTTP/3 function returns an error in context of a QUIC stream, it's
this function's responsibility now to finalize the entire QUIC connection
with the right code, if required. Previously, QUIC connection finalization
could be done both outside and inside such functions. The new rule follows
a similar rule for logging, leads to cleaner code, and allows to provide more
details about the error.
While here, a few error cases are no longer treated as fatal and QUIC connection
is no longer finalized in these cases. A few other cases now lead to
stream reset instead of connection finalization.
The PATH_RESPONSE frame must be expanded to 1200, except the case
when anti-amplification limit is in effect, i.e. on unvalidated paths.
Previously, the anti-amplification limit was always applied.
If client ID was never used, its refcount is zero. To keep things simple,
the ngx_quic_unref_client_id() function is now aware of such IDs.
If client ID was used, the ngx_quic_replace_retired_client_id() function
is supposed to find all users and unref the ID, thus ngx_quic_unref_client_id()
should not be called after it.
Previously, it was not enforced in the stream module.
Now, since b9e02e9b2f1d it is possible to specify protocols.
Since ALPN is always required, the 'require_alpn' setting is now obsolete.
The "min" and "max" arguments refer to UDP datagram size. Generating payload
requires to account properly for header size, which is variable and depends on
payload size and packet number.
After fe919fd63b0b, processing QUIC streams was postponed until after handshake
completion, which means that 0-RTT is effectively off. With ssl_ocsp enabled,
it could be further delayed. This differs from how OCSP validation works with
SSL_read_early_data(). With this change, processing QUIC streams is unlocked
when obtaining 0-RTT secret.
The sent queue is sorted by packet number. It is possible to avoid
traversing full queue while handling ack ranges. It makes sense to
start traversing from the queue head (i.e. check oldest packets first).
With this patch, all traffic over a QUIC connection is compared to traffic
over QUIC streams. As long as total traffic is many times larger than stream
traffic, we consider this to be a flood.
With this patch, all traffic over HTTP/3 bidi and uni streams is counted in
the h3c->total_bytes field, and payload traffic is counted in the
h3c->payload_bytes field. As long as total traffic is many times larger than
payload traffic, we consider this to be a flood.
Request header traffic is counted as if all fields are literal. Response
header traffic is counted as is.
Checking the reset after encryption avoids false positives. More importantly,
it avoids the check entirely in the usual case where decryption succeeds.
RFC 9000, 10.3.1 Detecting a Stateless Reset
Endpoints MAY skip this check if any packet from a datagram is
successfully processed.
As per RFC 9000:
An endpoint that receives a STOP_SENDING frame MUST send a RESET_STREAM
frame if the stream is in the "Ready" or "Send" state.
An endpoint SHOULD copy the error code from the STOP_SENDING frame to
the RESET_STREAM frame it sends, but it can use any application error code.
The flag indicates that the entire response was sent to the socket up to the
last_buf flag. The flag is only usable for protocol implementations that call
ngx_http_write_filter() from header filter, such as HTTP/1.x and HTTP/3.
Similar to the previous change, a segmentation fault occurres when evaluating
SSL certificates on a QUIC connection due to an uninitialized stream session.
The fix is to adjust initializing the QUIC part of a connection until after
it has session and variables initialized.
Similarly, this appends logging error context for QUIC connections:
- client 127.0.0.1:54749 connected to 127.0.0.1:8880 while handling frames
- quic client timed out (60: Operation timed out) while handling quic input
A QUIC connection doesn't have c->log->data and friends initialized to sensible
values. Yet, a request can be created in the certificate callback with such an
assumption, which leads to a segmentation fault due to null pointer dereference
in ngx_http_free_request(). The fix is to adjust initializing the QUIC part of
a connection such that it has all of that in place.
Further, this appends logging error context for unsuccessful QUIC handshakes:
- cannot load certificate .. while handling frames
- SSL_do_handshake() failed .. while sending frames
OpenSSL library QUIC support cannot be tested at configure time when
using the --with-openssl option so assume it's present if requested.
While here, fixed the error message in case QUIC support is missing.
Previously the counter was not incremented for HTTP/3 streams, but still
decremented in ngx_http_close_connection(). There are two solutions here, one
is to increment the counter for HTTP/3 streams, and the other one is not to
decrement the counter for HTTP/3 streams. The latter solution looks
inconsistent with ngx_stat_reading/ngx_stat_writing, which are incremented on a
per-request basis. The change adds ngx_stat_active increment for HTTP/3
request and push streams.
Notably, it is to avoid setting the TCP_NODELAY flag for QUIC streams
in ngx_http_upstream_send_response(). It is an invalid operation on
inherently SOCK_DGRAM sockets, which leads to QUIC connection close.
The change reduces diff to the default branch in stream content phase.
This function was only referenced from ngx_http_v3_create_push_request() to
initialize push connection log. Now the log handler is copied from the parent
request connection.
The change reduces diff to the default branch.
The functions ngx_quic_handle_read_event() and ngx_quic_handle_write_event()
are added. Previously this code was a part of ngx_handle_read_event() and
ngx_handle_write_event().
The change simplifies ngx_handle_read_event() and ngx_handle_write_event()
by moving QUIC-related code to a QUIC source file.
Previously it had -1 as fd. This fixes proxying, which relies on downstream
connection having a real fd. Also, this reduces diff to the default branch for
ngx_close_connection().
The request body filter chain is no longer called after processing
a DATA frame. Instead, we now post a read event to do this. This
ensures that multiple small DATA frames read during the same event loop
iteration are coalesced together, resulting in much faster processing.
Since rb->buf can now contain unprocessed data, window update is no
longer sent in ngx_http_v2_state_read_data() in case of flow control
being used due to filter buffering. Instead, window will be updated
by ngx_http_v2_read_client_request_body_handler() in the posted read
event.
Following rb->filter_need_buffering changes, request body reading is
only finished after the filter chain is called and rb->last_saved is set.
As such, with r->request_body_no_buffering, timer on fc->read is no
longer removed when the last part of the body is received, potentially
resulting in incorrect behaviour.
The fix is to call ngx_http_v2_process_request_body() from the
ngx_http_v2_read_unbuffered_request_body() function instead of
directly calling ngx_http_v2_filter_request_body(), so the timer
is properly removed.
In the body read handler, the window was incorrectly calculated
based on the full buffer size instead of the amount of free space
in the buffer. If the request body is buffered by a filter, and
the buffer is not empty after the read event is generated by the
filter to resume request body processing, this could result in
"http2 negative window update" alerts.
Further, in the body ready handler and in ngx_http_v2_state_read_data()
the buffer wasn't cleared when the data were already written to disk,
so the client might stuck without window updates.
If a MAX_DATA frame was received before any stream was created, then the worker
process would crash in nginx_quic_handle_max_data_frame() while traversing the
stream tree. The issue is solved by adding a check that makes sure the tree is
not empty.
If a filter wants to buffer the request body during reading (for
example, to check an external scanner), it can now do so. To make
it possible, the code now checks rb->last_saved (introduced in the
previous change) along with rb->rest == 0.
Since in HTTP/2 this requires flow control to avoid overflowing the
request body buffer, so filters which need buffering have to set
the rb->filter_need_buffering flag on the first filter call. (Note
that each filter is expected to call the next filter, so all filters
will be able set the flag if needed.)
It indicates that the last buffer was received by the save filter,
and can be used to check this at higher levels. To be used in the
following changes.
If due to an error ngx_http_request_body_save_filter() is called
more than once with rb->rest == 0, this used to result in a segmentation
fault. Added an alert to catch such errors, just in case.
Previously, fully preread unbuffered requests larger than client body
buffer size were saved to disk, despite the fact that "unbuffered" is
expected to imply no disk buffering.
The save body filter saves the request body to disk once the buffer is full.
Yet in HTTP/2 this might happen even if there is no need to save anything
to disk, notably when content length is known and the END_STREAM flag is
sent in a separate empty DATA frame. Workaround is to provide additional
byte in the buffer, so saving the request body won't be triggered.
This fixes unexpected request body disk buffering in HTTP/2 observed after
the previous change when content length is known and the END_STREAM flag
is sent in a separate empty DATA frame.
In particular, now the code always uses a buffer limited by
client_body_buffer_size. At the cost of an additional copy it
ensures that small DATA frames are not directly mapped to small
write() syscalls, but rather buffered in memory before writing.
Further, requests without Content-Length are no longer forced
to use temporary files.
With SSL it is possible that an established connection is ready for
reading after the handshake. Further, events might be already disabled
in case of level-triggered event methods. If this happens and
ngx_http_upstream_send_request() blocks waiting for some data from
the upstream, such as flow control in case of gRPC, the connection
will time out due to no read events on the upstream connection.
Fix is to explicitly check the c->read->ready flag if sending request
blocks and post a read event if it is set.
Note that while it is possible to modify ngx_ssl_handshake() to keep
read events active, this won't completely resolve the issue, since
there can be data already received during the SSL handshake
(see 573bd30e46b4).
Hash initialization ignores elements with key.data set to NULL.
Nevertheless, the initial hash bucket size check didn't skip them,
resulting in unnecessary restrictions on, for example, variables with
long names and with the NGX_HTTP_VARIABLE_NOHASH flag.
Fix is to update the initial hash bucket size check to skip elements
with key.data set to NULL, similarly to how it is done in other parts
of the code.
Requires OpenSSL 3.0 compiled with "enable-ktls" option. Further, KTLS
needs to be enabled in kernel, and in OpenSSL, either via OpenSSL
configuration file or with "ssl_conf_command Options KTLS;" in nginx
configuration.
On FreeBSD, kernel TLS is available starting with FreeBSD 13.0, and
can be enabled with "sysctl kern.ipc.tls.enable=1" and "kldload ktls_ocf"
to load a software backend, see man ktls(4) for details.
On Linux, kernel TLS is available starting with kernel 4.13 (at least 5.2
is recommended), and needs kernel compiled with CONFIG_TLS=y (with
CONFIG_TLS=m, which is used at least on Ubuntu 21.04 by default,
the tls module needs to be loaded with "modprobe tls").
While clock_gettime(CLOCK_MONOTONIC_COARSE) is faster than
clock_gettime(CLOCK_MONOTONIC), the latter is fast enough on Linux for
practical usage, and the difference is negligible compared to other costs
at each event loop iteration. On the other hand, CLOCK_MONOTONIC_COARSE
causes various issues with typical CONFIG_HZ=250, notably very inaccurate
limit_rate handling in some edge cases (ticket #1678) and negative difference
between $request_time and $upstream_response_time (ticket #1965).
This is a recommended behavior by RFC 7301 and is useful
for mitigation of protocol confusion attacks [1].
To avoid possible negative effects, list of supported protocols
was extended to include all possible HTTP protocol ALPN IDs
registered by IANA [2], i.e. "http/1.0" and "http/0.9".
[1] https://alpaca-attack.com/
[2] https://www.iana.org/assignments/tls-extensiontype-values/
In b87b7092cedb (nginx 1.21.1), logging level of "upstream sent invalid
header" errors was accidentally changed to "info". This change restores
the "error" level, which is a proper logging level for upstream-side
errors.
The u->keepalive flag is initialized early if the response has no body
(or an empty body), and needs to be reset if there are any extra data,
similarly to how it is done in ngx_http_proxy_copy_filter(). Missed
in 83c4622053b0.
The "proxy_half_close" directive enables handling of TCP half close. If
enabled, connection to proxied server is kept open until both read ends get
EOF. Write end shutdown is properly transmitted via proxy.
Do this only when the entire request body is empty and
r->request_body_in_file_only is set.
The issue manifested itself with missing warning "a client request body is
buffered to a temporary file" when the entire rb->buf is full and all buffers
are delayed by a filter.
This adds new Auth-SSL-Protocol and Auth-SSL-Cipher headers to
the mail proxy auth protocol when SSL is enabled.
This can be useful for detecting users using older clients that
negotiate old ciphers when you want to upgrade to newer
TLS versions of remove suppport for old and insecure ciphers.
You can use your auth backend to notify these users before the
upgrade that they either need to upgrade their client software
or contact your support team to work out an upgrade path.
To load old/weak server or client certificates it might be needed to adjust
the security level, as introduced in OpenSSL 1.1.0. This change ensures that
ciphers are set before loading the certificates, so security level changes
via the cipher string apply to certificate loading.
Export ciphers are forbidden to negotiate in TLS 1.1 and later protocol modes.
They are disabled since OpenSSL 1.0.2g by default unless explicitly configured
with "enable-weak-ssl-ciphers", and completely removed in OpenSSL 1.1.0.
A new behaviour was introduced in OpenSSL 1.1.1e, when a peer does not send
close_notify before closing the connection. Previously, it was to return
SSL_ERROR_SYSCALL with errno 0, known since at least OpenSSL 0.9.7, and is
handled gracefully in nginx. Now it returns SSL_ERROR_SSL with a distinct
reason SSL_R_UNEXPECTED_EOF_WHILE_READING ("unexpected eof while reading").
This leads to critical errors seen in nginx within various routines such as
SSL_do_handshake(), SSL_read(), SSL_shutdown(). The behaviour was restored
in OpenSSL 1.1.1f, but presents in OpenSSL 3.0 by default.
Use of the SSL_OP_IGNORE_UNEXPECTED_EOF option added in OpenSSL 3.0 allows
to set a compatible behaviour to return SSL_ERROR_ZERO_RETURN:
https://git.openssl.org/?p=openssl.git;a=commitdiff;h=09b90e0
See for additional details: https://github.com/openssl/openssl/issues/11381
The OPENSSL_SUPPRESS_DEPRECATED macro is used to suppress deprecation warnings.
This covers Session Tickets keys, SSL Engine, DH low level API for DHE ciphers.
Unlike OPENSSL_API_COMPAT, it works well with OpenSSL built with no-deprecated.
In particular, it doesn't unhide various macros in OpenSSL includes, which are
meant to be hidden under OPENSSL_NO_DEPRECATED.
The only consumer is a callback function for SSL_CTX_set_tmp_rsa_callback()
deprecated in OpenSSL 1.1.0. Now the function is conditionally compiled too.
The latest HTTP/1.1 draft describes Transfer-Encoding in HTTP/1.0 as having
potentially faulty message framing as that could have been forwarded without
handling of the chunked encoding, and forbids processing subsequest requests
over that connection: https://github.com/httpwg/http-core/issues/879.
While handling of such requests is permitted, the most secure approach seems
to reject them.
The c->read->ready and c->write->ready flags might be reset during
the handshake, and not set again if the handshake was finished on
the other event. At the same time, some data might be read from
the socket during the handshake, so missing c->read->ready flag might
result in a connection hang, for example, when waiting for an SMTP
greeting (which was already received during the handshake).
Found by Sergey Kandaurov.
Previously, when cleaning up a QUIC stream in shutdown mode,
ngx_quic_shutdown_quic() was called, which could close the QUIC connection
right away. This could be a problem if the connection was referenced up the
stack. For example, this could happen in ngx_quic_init_streams(),
ngx_quic_close_streams(), ngx_quic_create_client_stream() etc.
With a typical HTTP/3 client the issue is unlikely because of HTTP/3 uni
streams which need a posted event to close. In this case QUIC connection
cannot be closed right away.
Now QUIC connection read event is posted and it will shut down the connection
asynchronously.
After receiving GOAWAY, client is not supposed to create new streams. However,
until client reads this frame, we allow it to create new streams, which are
gracefully rejected. To prevent client from abusing this algorithm, a new
limit is introduced. Upon reaching keepalive_requests * 2, server now closes
the entire QUIC connection claiming excessive load.
The "hq" mode is HTTP/0.9-1.1 over QUIC. The following limits are introduced:
- uni streams are not allowed
- keepalive_requests is enforced
- keepalive_time is enforced
In case of error, QUIC connection is finalized with 0x101 code. This code
corresponds to HTTP/3 General Protocol Error.
Previously, in-flight byte counter and congestion window were properly
maintained, but the limit was not properly implemented.
Now a new datagram is sent only if in-flight byte counter is less than window.
The limit is datagram-based, which means that a single datagram may lead to
exceeding the limit, but the next one will not be sent.
Previously, the error was ignored leading to unnecessary retransmits.
Now, unsent frames are returned into output queue, state is reset, and
timer is started for the next send attempt.
As per quic-http-34:
Endpoints SHOULD create the HTTP control stream as well as the
unidirectional streams required by mandatory extensions (such as the
QPACK encoder and decoder streams) first, and then create additional
streams as allowed by their peer.
Previously, client could create and destroy additional uni streams unlimited
number of times before creating mandatory streams.
OpenSSL is known to provide read keys for an encryption level before the
level is active in TLS, following the old BoringSSL API. In BoringSSL,
it was then fixed to defer releasing read keys until QUIC may use them.
The directive enables usage of UDP segmentation offloading by quic.
By default, gso is disabled since it is not always operational when
detected (depends on interface configuration).
To improve output performance, UDP segmentation offloading is used
if available. If there is a significant amount of data in an output
queue and path is verified, QUIC packets are not sent one-by-one,
but instead are collected in a buffer, which is then passed to kernel
in a single sendmsg call, using UDP GSO. Such method greatly decreases
number of system calls and thus system load.
Additionally, the ngx_init_srcaddr_cmsg() function is introduced which
initializes control message with connection local address.
The NGX_HAVE_ADDRINFO_CMSG macro is defined when at least one of methods
to deal with corresponding control message is available.
The ngx_wsasend_chain() and ngx_wsarecv_chain() functions were
modified to use only preallocated memory, and the number of
preallocated wsabufs was increased to 64.
Sometimes, QUIC packets need to be of certain (or minimal) size. This is
achieved by adding PADDING frames. It is possible, that adding padding will
affect header size, thus forcing us to recalculate padding size once more.
In d1bde5c3c5d2, the number of preallocated iovec's for ngx_readv_chain()
was increased. Still, in some setups, the function might allocate memory
for iovec's from a connection pool, which is only freed when closing the
connection.
The ngx_readv_chain() function was modified to use only preallocated
memory, similarly to the ngx_writev_chain() change in 8e903522c17a.
Renamed header -> field per quic-qpack naming convention, in particular:
- Header Field -> Field Line
- Header Block -> (Encoded) Field Section
- Without Name Reference -> With Literal Name
- Header Acknowledgement -> Section Acknowledgment
Control characters (0x00-0x1f, 0x7f) and space are not expected to appear
in the Host header. Requests with such characters in the Host header are
now unconditionally rejected.
In 71edd9192f24 logging of invalid headers which were rejected with the
NGX_HTTP_PARSE_INVALID_HEADER error was restricted to just the "client
sent invalid header line" message, without any attempts to log the header
itself.
This patch returns logging of the header up to the invalid character and
the character itself. The r->header_end pointer is now properly set
in all cases to make logging possible.
The same logging is also introduced when parsing headers from upstream
servers.
Control characters (0x00-0x1f, 0x7f), space, and colon were never allowed in
header names. The only somewhat valid use is header continuation which nginx
never supported and which is explicitly obsolete by RFC 7230.
Previously, such headers were considered invalid and were ignored by default
(as per ignore_invalid_headers directive). With this change, such headers
are unconditionally rejected.
It is expected to make nginx more resilient to various attacks, in particular,
with ignore_invalid_headers switched off (which is inherently unsecure, though
nevertheless sometimes used in the wild).
Control characters (0x00-0x1f, 0x7f) were never allowed in URIs, and must
be percent-encoded by clients. Further, these are not believed to appear
in practice. On the other hand, passing such characters might make various
attacks possible or easier, despite the fact that currently allowed control
characters are not significant for HTTP request parsing.
From now on, requests with spaces in URIs are immediately rejected rather
than allowed. Spaces were allowed in 31e9677b15a1 (0.8.41) to handle bad
clients. It is believed that now this behaviour causes more harm than
good.
Per RFC 3986 only the following characters are allowed in URIs unescaped:
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
And "%" can appear as a part of escaping itself. The following
characters are not allowed and need to be escaped: %00-%1F, %7F-%FF,
" ", """, "<", ">", "\", "^", "`", "{", "|", "}".
Not escaping ">" is known to cause problems at least with MS Exchange (see
http://nginx.org/pipermail/nginx-ru/2010-January/031261.html) and in
Tomcat (ticket #2191).
The patch adds escaping of the following chars in all URI parts: """, "<",
">", "\", "^", "`", "{", "|", "}". Note that comments are mostly preserved
to outline important characters being escaped.
HTTP clients are not allowed to generate such requests since Transfer-Encoding
introduction in RFC 2068, and they are not expected to appear in practice
except in attempts to perform a request smuggling attack. While handling of
such requests is strictly defined, the most secure approach seems to reject
them.
No valid CONNECT requests are expected to appear within nginx, since it
is not a forward proxy. Further, request line parsing will reject
proper CONNECT requests anyway, since we don't allow authority-form of
request-target. On the other hand, RFC 7230 specifies separate message
length rules for CONNECT which we don't support, so make sure to always
reject CONNECTs to avoid potential abuse.
Previously, TRACE requests were rejected before parsing Transfer-Encoding.
This is not important since keepalive is not enabled at this point anyway,
though rejecting such requests after properly parsing other headers is
less likely to cause issues in case of further code changes.
After 2096b21fcd10, a single RST_STREAM(NO_ERROR) may not result in an error.
This change removes several unnecessary ctx->type checks for such a case.
As per quic-http-34, these are the cases when this error should be generated:
If an endpoint receives a second SETTINGS frame
on the control stream, the endpoint MUST respond with a connection
error of type H3_FRAME_UNEXPECTED
SETTINGS frames MUST NOT be sent on any stream other than the control
stream. If an endpoint receives a SETTINGS frame on a different
stream, the endpoint MUST respond with a connection error of type
H3_FRAME_UNEXPECTED.
A client MUST NOT send a PUSH_PROMISE frame. A server MUST treat the
receipt of a PUSH_PROMISE frame as a connection error of type
H3_FRAME_UNEXPECTED; see Section 8.
The MAX_PUSH_ID frame is always sent on the control stream. Receipt
of a MAX_PUSH_ID frame on any other stream MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED.
Receipt of an invalid sequence of frames MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED; see Section 8. In
particular, a DATA frame before any HEADERS frame, or a HEADERS or
DATA frame after the trailing HEADERS frame, is considered invalid.
A CANCEL_PUSH frame is sent on the control stream. Receiving a
CANCEL_PUSH frame on a stream other than the control stream MUST be
treated as a connection error of type H3_FRAME_UNEXPECTED.
The GOAWAY frame is always sent on the control stream.
The quic-http-34 is ambiguous as to what error should be generated for the
first frame in control stream:
Each side MUST initiate a single control stream at the beginning of
the connection and send its SETTINGS frame as the first frame on this
stream. If the first frame of the control stream is any other frame
type, this MUST be treated as a connection error of type
H3_MISSING_SETTINGS.
If a DATA frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED.
If a HEADERS frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED.
Previously, H3_FRAME_UNEXPECTED had priority, but now H3_MISSING_SETTINGS has.
The arguments in the spec sound more compelling for H3_MISSING_SETTINGS.
- Function ngx_quic_control_flow() is introduced. This functions does
both MAX_DATA and MAX_STREAM_DATA flow controls. The function is called
from STREAM and RESET_STREAM frame handlers. Previously, flow control
was only accounted for STREAM. Also, MAX_DATA flow control was not accounted
at all.
- Function ngx_quic_update_flow() is introduced. This function advances flow
control windows and sends MAX_DATA/MAX_STREAM_DATA. The function is called
from RESET_STREAM frame handler, stream cleanup handler and stream recv()
handler.
Recent fixes to SSL shutdown with lingering close (554c6ae25ffc, 1.19.5)
broke logging of SSL variables. To make sure logging of SSL variables
works properly, avoid freeing c->ssl when doing an SSL shutdown before
lingering close.
Reported by Reinis Rozitis
(http://mailman.nginx.org/pipermail/nginx/2021-May/060670.html).
Instead of calling SSL_free() with each return point, introduced a single
place where cleanup happens. As a positive side effect, this fixes two
potential memory leaks on ngx_handle_read_event() and ngx_handle_write_event()
errors where there were no SSL_free() calls (though unlikely practical,
as errors there are only expected to happen due to bugs or kernel issues).
When starting processing a new encoder instruction, the header state is not
memzero'ed because generally it's burdensome. If the header value is empty,
this resulted in inserting a stale value left from the previous instruction.
Based on a patch by Zhiyong Sun.
On Linux, SO_REUSEADDR allows completely duplicate UDP sockets, so using
SO_REUSEADDR when testing configuration results in packets being dropped
if there is an existing traffic on the sockets being tested (ticket #2187).
While dropped packets are expected with UDP, it is better to avoid this
when possible.
With this change, SO_REUSEADDR is no longer set on datagram sockets when
testing configuration.
Since we anyway do not set SO_REUSEPORT when testing configuration
(see ecb5cd305b06), trying to open additional sockets does not make much
sense, as all these additional sockets are expected to result in EADDRINUSE
errors from bind(). On the other hand, there are reports that trying
to open these sockets takes significant time under load: total configuration
testing time greater than 15s was observed in ticket #2188, compared to less
than 1s without load.
With this change, no additional sockets are opened during testing
configuration.
As per quic-transport 34, FINAL_SIZE_ERROR is generated if an endpoint received
a STREAM frame or a RESET_STREAM frame containing a final size that was lower
than the size of stream data that was already received.
Since nginx always uses exactly one entry in the question section of
a DNS query, and never uses compression pointers in this entry, parsing
of a DNS response in ngx_resolver_process_response() does not expect
compression pointers to appear in the question section of the DNS
response. Indeed, compression pointers in the first name of a DNS response
hardly make sense, do not seem to be allowed by RFC 1035 (which says
"a pointer to a prior occurance of the same name", note "prior"), and
were never observed in practice.
Added an explicit check to ngx_resolver_process_response()'s parsing
of the question section to properly report an error if compression pointers
nevertheless appear in the question section.
Instead of checking on each label if we need to place a dot or not,
now it always adds a dot after a label, and reduces the resulting
length afterwards.
Previously, anything with any of the two high bits set were interpreted
as compression pointers. This is incorrect, as RFC 1035 clearly states
that "The 10 and 01 combinations are reserved for future use". Further,
the 01 combination is actually allocated for EDNS extended label type
(see RFC 2671 and RFC 6891), not really used though.
Fix is to reject unrecognized label types rather than misinterpreting
them as compression pointers.
It is believed to be harmless, and in the worst case it uses some
uninitialized memory as a part of the compression pointer length,
eventually leading to the "name is out of DNS response" error.
Generic function ngx_quic_order_bufs() is introduced. This function creates
and maintains a chain of buffers with holes. Holes are marked with b->sync
flag. Several buffers and holes in this chain may share the same underlying
memory buffer.
When processing STREAM frames with this function, frame data is copied only
once to the right place in the stream input chain. Previously data could
be copied twice. First when buffering an out-of-order frame data, and then
when filling stream buffer from ordered frame queue. Now there's only one
data chain for both tasks.
When variables are used in ssl_certificate or ssl_certificate_key, a request
is created in the certificate callback to evaluate the variables, and then
freed. Freeing it, however, updates c->log->action to "closing request",
resulting in confusing error messages like "client timed out ... while
closing request" when a client times out during the SSL handshake.
Fix is to restore c->log->action after calling ngx_http_free_request().
According to profiling, those two are among most frequently called,
so inlining is generally useful, and unrolling should help with it.
Further, this fixes undefined behaviour seen with invalid values.
Inspired by Yu Liu.
Similarly to smtpd_hard_error_limit in Postfix and smtp_max_unknown_commands
in Exim, specifies the number of errors after which the connection is closed.
The change is mostly the same as the SMTP one (04e43d03e153 and 3f5d0af4e40a),
and ensures that nginx is able to properly handle or reject multiple IMAP
commands. The s->cmd field is not really used and set for consistency.
Non-synchronizing literals handling in invalid/unknown commands is limited,
so when a non-synchronizing literal is detected at the end of a discarded
line, the connection is closed.
Only "A-Za-z0-9-._" characters now allowed (which is stricter than what
RFC 3501 requires, but expected to be enough for all known clients),
and tags shouldn't be longer than 32 characters.
Previously, s->backslash was set if any of the arguments was a quoted
string with a backslash character. After successful command parsing
this resulted in all arguments being filtered to remove backslashes.
This is, however, incorrect, as backslashes should not be removed from
IMAP literals. For example:
S: * OK IMAP4 ready
C: a01 login {9}
S: + OK
C: user\name "pass\"word"
S: * BAD internal server error
resulted in "Auth-User: username" instead of "Auth-User: user\name"
as it should.
Fix is to apply backslash filtering on per-argument basis during parsing.
As discussed in the previous change, s->arg_start handling in the "done"
labels of ngx_mail_pop3_parse_command(), ngx_mail_imap_parse_command(),
and ngx_mail_smtp_parse_command() is wrong: s->arg_start cannot be
set there, as it is handled and cleared on all code paths where the
"done" labels are reached. The relevant code is dead and now removed.
Previously, s->arg_start was left intact after invalid IMAP commands,
and this might result in an argument incorrectly added to the following
command. Similarly, s->backslash was left intact as well, leading
to unneeded backslash removal.
For example (LFs from the client are explicitly shown as "<LF>"):
S: * OK IMAP4 ready
C: a01 login "\<LF>
S: a01 BAD invalid command
C: a0000000000\2 authenticate <LF>
S: a00000000002 aBAD invalid command
The backslash followed by LF generates invalid command with s->arg_start
and s->backslash set, the following command incorrectly treats anything
from the old s->arg_start to the space after the command as an argument,
and removes the backslash from the tag. If there is no space, s->arg_end
will be NULL.
Both things seem to be harmless though. In particular:
- This can be used to provide an incorrect argument to a command without
arguments. The only command which seems to look at the single argument
is AUTHENTICATE, and it checks the argument length before trying to
access it.
- Backslash removal uses the "end" pointer, and stops due to "src < end"
condition instead of scanning all the process memory if s->arg_end is
NULL (and arg[0].len is huge).
- There should be no backslashes in unquoted strings.
An obvious fix is to clear s->arg_start and s->backslash on invalid commands,
similarly to how it is done in POP3 parsing (added in 810:e3aa8f305d21) and
SMTP parsing.
This, however, makes it clear that s->arg_start handling in the "done"
label is wrong: s->arg_start cannot be legitimately set there, as it
is expected to be cleared in all possible cases when the "done" label is
reached. The relevant code is dead and will be removed by the following
change.
The change is mostly the same as the SMTP one (04e43d03e153 and 3f5d0af4e40a),
and ensures that nginx is able to properly handle or reject multiple POP3
commands, as required by the PIPELINING capability (RFC 2449). The s->cmd
field is not really used and set for consistency.
There is no need to scan buffer from s->buffer->pos, as we already scanned
the buffer till "p" and wasn't able to find an LF.
There is no real need for this change in SMTP, since it is at most a
microoptimization of a non-common code path. Similar code in IMAP, however,
will have to start scanning from "p" to be correct, since there can be
newlines in IMAP literals.
Previously, if an invalid SMTP command was split between reads, nginx failed
to wait for LF before returning an error, and interpreted the rest of the
command received later as a separate command.
The sw_invalid state in ngx_mail_smtp_parse_command(), introduced in
04e43d03e153, did not work, since ngx_mail_smtp_auth_state() clears
s->state when returning an error due to NGX_MAIL_PARSE_INVALID_COMMAND.
And not clearing s->state will introduce another problem: the rest
of the command would trigger duplicate error when rest of the command is
received.
Fix is to return NGX_AGAIN from ngx_mail_smtp_parse_command() until full
command is received.
Previously, if there were some pipelined SMTP data in the buffer when
a proxied connection with the backend was established, nginx called
ngx_mail_proxy_handler() to send these data, and not tried to send the
response to the last command. In most cases, this response was later sent
along with the response to the pipelined command, but if for some reason
client decides to wait for the response before finishing the next command
this might result in a connection hang.
Fix is to always call ngx_mail_proxy_handler() to send the response, and
additionally post an event to send the pipelined data if needed.
When using server push, a segfault occured because
ngx_http_v3_create_push_request() accessed ngx_http_v3_session_t object the old
way. Prior to 9ec3e71f8a61, HTTP/3 session was stored directly in c->data.
Now it's referenced by the v3_session field of ngx_http_connection_t.
With this change, it is now possible to use ngx_conf_merge_ptr_value()
to merge complex values. This change follows much earlier changes in
ngx_conf_merge_ptr_value() and ngx_conf_set_str_array_slot()
in 1452:cd586e963db0 (0.6.10) and 1701:40d004d95d88 (0.6.22), and the
change in ngx_conf_set_keyval_slot() (7728:485dba3e2a01, 1.19.4).
To preserve compatibility with existing 3rd party modules, both NULL
and NGX_CONF_UNSET_PTR are accepted for now.
Client IDs cannot be reused on different paths. This change allows to reuse
client id previosly seen on the same path (but with different dcid) in case
when no unused client IDs are available.
According to quic-transport, 9.1:
PATH_CHALLENGE, PATH_RESPONSE, NEW_CONNECTION_ID, and PADDING frames
are "probing frames", and all other frames are "non-probing frames".
Previously it was parsed in ngx_http_v3_streams.c, while the streams were
parsed in ngx_http_v3_parse.c. Now all parsing is done in one file. This
simplifies parsing API and cleans up ngx_http_v3_streams.c.
Previously, an ngx_http_v3_connection_t object was created for HTTP/3 and
then assinged to c->data instead of the generic ngx_http_connection_t object.
Now a direct reference is added to ngx_http_connection_t, which is less
confusing and does not require a flag for http3.
It's used instead of accessing c->quic->parent->data directly. Apart from being
simpler, it allows to change the way session is stored in the future by changing
the macro.
Now ngx_http_v3_ack_header() and ngx_http_v3_inc_insert_count() always generate
decoder error. Our implementation does not use dynamic tables and does not
expect client to send Section Acknowledgement or Insert Count Increment.
Stream Cancellation, on the other hand, is allowed to be sent anyway. This is
why ngx_http_v3_cancel_stream() does not return an error.
The patch adds proper transitions between multiple networking addresses that
can be used by a single quic connection. New networking paths are validated
using PATH_CHALLENGE/PATH_RESPONSE frames.
Due to structure's alignment, some uninitialized memory contents may have
been passed between processes.
Zeroing was removed in 0215ec9aaa8a.
Reported by Johnny Wang.
7.2.1:
If a DATA frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED;
7.2.2:
If a HEADERS frame is received on a control stream, the recipient MUST
respond with a connection error (Section 8) of type H3_FRAME_UNEXPECTED.
With SMTP pipelining, ngx_mail_read_command() can be called with s->buffer
without any space available, to parse additional commands received to the
buffer on previous calls. Previously, this resulted in recv() being called
with zero length, resulting in zero being returned, which was interpreted
as a connection close by the client, so nginx silently closed connection.
Fix is to avoid calling c->recv() if there is no free space in the buffer,
but continue parsing of the already received commands.
Currently both names are used which is confusing. Historically these were
different objects, but now it's the same one. The name qs (quic stream) makes
more sense than sn (stream node).
PATH_RESPONSE was explicitly forbidden in 0-RTT since at least draft-22, but
the Frame Types table was not updated until recently while in IESG evaluation.
Stop including QUIC headers with no user-serviceable parts inside.
This allows to provide a much cleaner QUIC interface. To cope with that,
ngx_quic_derive_key() is now explicitly exported for v3 and quic modules.
Additionally, this completely hides the ngx_quic_keys_t internal type.
The "ngx_event_quic.h" header file now contains only public definitions,
used by modules. All internal definitions are moved into
the "ngx_event_quic_connection.h" header file.
The function correctly cleans up resources in case of failure to create
initial server id: it removes previously created udp node for odcid from
listening rbtree.
Currently listener contains rbtree with multiple nodes for single QUIC
connection: each corresponding to specific server id. Each udp node points
to same ngx_connection_t, which points to QUIC connection via c->udp field.
Thus when an event handler is called, it only gets ngx_connection_t with
c->udp pointing to QUIC connection. This makes it hard to obtain actual
node which was used to dispatch packet (it requires to repeat DCID lookup).
Additionally, ngx_quic_connection_t->udp field is only needed to keep a
pointer in c->udp. The node is not added into the tree and does not carry
useful information.
Sometimes it is required to process datagram properties at higher level (i.e.
QUIC is interested in source address which may change and IP options). The
patch adds ngx_udp_dgram_t structure used to pass packet-related information
in c->udp.
Previously, when a new datagram arrived, data were copied from the UDP layer
to the QUIC layer via c->recv() interface. Now UDP buffer is accessed
directly.
OpenSSL 3.0 started to require HKDF-Extract output PRK length pointer
used to represent the amount of data written to contain the length of
the key buffer before the call. EVP_PKEY_derive() documents this.
See HKDF_Extract() internal implementation update in this change:
https://github.com/openssl/openssl/commit/5a285ad
The function ngx_quic_shutdown_connection() waits until all non-cancelable
streams are closed, and then closes the connection. In HTTP/3 cancelable
streams are all unidirectional streams except push streams.
The function is called from HTTP/3 when client reaches keepalive_requests.
In case of long header packets, dcid length was not read correctly.
While there, macros to parse uint64 was fixed as well as format specifiers
to print it in debug mode.
Thanks to Gao Yan <gaoyan09@baidu.com>.
As per quic-transport-34:
An endpoint also restarts its idle timer when sending an ack-eliciting
packet if no other ack-eliciting packets have been sent since last receiving
and processing a packet.
Previously, the timer was set for any packet.
The structure is used to parse an HTTP/3 request. An object of this type is
added to ngx_http_request_t instead of h3_parse generic pointer.
Also, the new field is located outside of the request ephemeral zone to keep it
safe after request headers are parsed.
The "max_ack_delay", "ack_delay_exponent", and "max_udp_payload_size"
transport parameters were not communicated to client.
The "disable_active_migration" and "active_connection_id_limit"
parameters were not saved into zero-rtt context.
18.1. Reserved Transport Parameters
Transport parameters with an identifier of the form "31 * N + 27" for
integer values of N are reserved to exercise the requirement that
unknown transport parameters be ignored. These transport parameters
have no semantics, and can carry arbitrary values.
Setting the timer is brought into compliance with quic-recovery-34. Now it's
set from a single function ngx_quic_set_lost_timer() that takes into account
both loss detection and PTO. The following issues are fixed with this change:
- when in loss detection mode, discarding a context could turn off the
timer forever after switching to the PTO mode
- when in loss detection mode, sending a packet resulted in rescheduling the
timer as if it's always in the PTO mode
As per quic-transport-33:
An endpoint MUST acknowledge all ack-eliciting Initial and Handshake
packets immediately
If a packet carrying Initial or Handshake ACK was lost, a non-immediate ACK
should not be sent later. Instead, client is expected to send a new packet
to acknowledge.
Sending non-immediate ACKs for Initial packets can cause the client to
generate an inflated RTT sample.
The token generation in QUIC is reworked. Single host key is used to generate
all required keys of needed sizes using HKDF.
The "quic_stateless_reset_token_key" directive is removed. Instead, the
"quic_host_key" directive is used, which reads key from file, or sets it
to random bytes if not specified.
The flag was introduced to create type-aware CONNECTION_CLOSE frames,
and now is replaced with frame type information, directly accessible.
Notably, this fixes type logging for received frames in b3d9e57d0f62.
The flag is used in ngx_http_finalize_connection() to switch client connection
to the keepalive mode. Since eaea7dac3292 this code is not executed for HTTP/3
which allows us to revert the change and get back to the default branch code.
The change reduces diff to the default branch for
src/http/ngx_http_request_body.c.
Also, client Content-Length, if present, is now checked against the real body
size sent by client.
- split ngx_quic_process_packet() in two functions with the second one called
ngx_quic_process_payload() in charge of decrypring and handling the payload
- renamed ngx_quic_payload_handler() to ngx_quic_handle_frames()
- moved error cleanup from ngx_quic_input() to ngx_quic_process_payload()
- moved handling closed connection from ngx_quic_handle_frames() to
ngx_quic_process_payload()
- minor fixes
Previously, quic connection object was created when Retry packet was sent.
This is neither necessary nor convenient, and contradicts the idea of retry:
protecting from bad clients and saving server resources.
Now, the connection is not created, token is verified cryptographically
instead of holding it in connection.
This function should be called at the end of an event handler to prepare the
event for the next handler call. Particularly, the "active" flag is set or
cleared depending on data availability.
With this call missing in one code path, read handler was not called again
after handling the initial part of the client request, if the request was too
big to fit into a single STREAM frame.
Now ngx_handle_read_event() is called in this code path. Also, read timer is
restarted.
A header with the name containing null, CR, LF, colon or uppercase characters,
is now considered an error. A header with the value containing null, CR or LF,
is also considered an error.
Also, header is considered invalid unless its name only contains lowercase
characters, digits, minus and optionally underscore. Such header can be
optionally ignored.
- :method, :path and :scheme are expected exactly once and not empty
- :method and :scheme character validation is added
- :authority cannot appear more than once
Notably, the version negotiation table is updated to reject draft-33/QUICv1
(which requires a new TLS codepoint) unless explicitly asked to built with.
The quic kernel bpf helper inspects packet payload for DCID, extracts key
and routes the packet into socket matching the key.
Due to reuseport feature, each worker owns a personal socket, which is
identified by the same key, used to create DCID.
BPF objects are locked in RAM and are subject to RLIMIT_MEMLOCK.
The "ulimit -l" command may be used to setup proper limits, if maps
cannot be created with EPERM or updated with ETOOLONG.
Previously, when processing client ACK, rtt could be calculated for a packet
different than the largest if it was missing in the sent chain. Even though
this is an unlikely situation, rtt based on a different packet could be larger
than needed leading to bigger pto timeout and performance degradation.
Previously, this only worked for Application level because before
quic-transport-30, there were the following constraints:
Because the receiver doesn't use the ACK Delay for Initial and Handshake
packets, a sender SHOULD send a value of 0.
When adjusting an RTT sample using peer-reported acknowledgement delays, an
endpoint ... MUST ignore the ACK Delay field of the ACK frame for packets
sent in the Initial and Handshake packet number space.
Now initial output packet is not padded anymore if followed by a handshake
packet. If the datagram is still not big enough to satisfy minimum size
requirements, handshake packet is padded.
The patch replaces c->send() occurences with c->send_chain(), because the
latter accounts for the local address, which may be different if the wildcard
listener is used.
Previously, server sent response to client using address different from
one client connected to.
Notably, this fixes an issue with Chrome that can emit a "certificate_unknown"
alert during the SSL handshake where c->ssl->no_wait_shutdown is not yet set.
The filter is responsible for creating HTTP/3 response header and body.
The change removes differences to the default branch for
ngx_http_chunked_filter_module and ngx_http_header_filter_module.
Instead, appropriate format specifier for hexadecimal is used
in ngx_log_debug().
The STREAM frame "data" debug is moved into ngx_quic_log_frame(), similar
to all other frame fields debug.
Header value returned from the HTTP parser is expected to be null-terminated or
have a spare byte after the value bytes. When an empty header value was passed
by client in a literal header representation, neither was true. This could
result in segfault. The fix is to assign a literal empty null-terminated
string in this case.
Thanks to Andrey Kolyshkin.
The largely duplicate type-specific functions ngx_quic_parse_initial_header(),
ngx_quic_parse_handshake_header(), and a missing one for 0-RTT, were merged.
The new order of functions listed in ngx_event_quic_transport.c reflects this.
|_ ngx_quic_parse_long_header - version-invariant long header fields
\_ ngx_quic_supported_version - a helper to decide we can go further
\_ ngx_quic_parse_long_header_v1 - QUICv1-specific long header fields
0-RTT packets previously appeared as Handshake are now logged as appropriate:
*1 quic packet rx long flags:db version:ff00001d
*1 quic packet rx early len:870
Logging SCID/DCID is no longer duplicated as were seen with Initial packets.
If trailers were missing and a chain carrying the last_buf flag had no data
in it, then last HTTP/1 chunk was broken. The problem was introduced while
implementing HTTP/3 response body generation.
The change fixes the issue and reduces diff to the mainline nginx.
Previously, if quic_stateless_reset_token_key was empty or unspecified,
initial stateless reset token was not generated. However subsequent tokens
were generated with empty key, which resulted in error with certain SSL
libraries, for example OpenSSL.
Now a random 32-byte stateless reset token key is generated if none is
specified in the configuration. As a result, stateless reset tokens are now
generated for all server ids.
Previously new dcid was generated in the same memory that was allocated for
qc->dcid when creating the QUIC connection. However this memory was also
referenced by initial_source_connection_id and retry_source_connection_id
transport parameters. As a result these parameters changed their values after
retry which broke the protocol.
A negotiated version is decoupled from NGX_QUIC_VERSION and, if supported,
now stored in c->quic->version after packets processing. It is then used
to create long header packets. Otherwise, the list of supported versions
(which may be many now) is sent in the Version Negotiation packet.
All packets in the connection are expected to have the same version.
Incoming packets with mismatched version are now rejected.
A number of unsigned variables has a special value, usually -1 or some maximum,
which produces huge numeric value in logs and makes them hard to read.
In order to distinguish such values in log, they are casted to the signed type
and printed as literal '-1'.
The client address validation didn't complete with a valid token,
which was broken after packet processing refactoring in d0d3fc0697a0.
An invalid or expired token was treated as a connection error.
Now we proceed as outlined in draft-ietf-quic-transport-32,
section 8.1.3 "Address Validation for Future Connections" below,
which is unlike validating the client address using Retry packets.
When a server receives an Initial packet with an address validation
token, it MUST attempt to validate the token, unless it has already
completed address validation. If the token is invalid then the
server SHOULD proceed as if the client did not have a validated
address, including potentially sending a Retry.
The connection is now closed in this case on internal errors only.
All key handling functionality is moved into ngx_quic_protection.c.
Public structures from ngx_quic_protection.h are now private and new
methods are available to manipulate keys.
A negotiated cipher is cached in QUIC connection from the set secret callback
to avoid calling SSL_get_current_cipher() on each encrypt/decrypt operation.
This also reduces the number of unwanted c->ssl->connection occurrences.
Previously, tx ACK frames held ranges in an array of ngx_quic_ack_range_t,
while rx ACK frames held ranges in the serialized format. Now serialized format
is used for both types of frames.
The patch resets ctx->frames queue, which may contain frames. It was possible
that congestion or amplification limits prevented all frames to be sent.
Retransmitted frames could be accounted twice as inflight: first time in
ngx_quic_congestion_lost() called from ngx_quic_resend_frames(), and later
from ngx_quic_discard_ctx().
This accounts for the following change:
* Require expansion of datagrams to ensure that a path supports at
least 1200 bytes:
- During the handshake ack-eliciting Initial packets from the
server need to be expanded
All values are prefixed with name and separated from it using colon.
Multiple values are listed without commas in between.
Rationale: this greatly simplifies log parsing for analysis.
For application level packets, only every second packet is now acknowledged,
respecting max ack delay.
13.2.1 Sending ACK Frames
In order to assist loss detection at the sender, an endpoint SHOULD
generate and send an ACK frame without delay when it receives an ack-
eliciting packet either:
* when the received packet has a packet number less than another
ack-eliciting packet that has been received, or
* when the packet has a packet number larger than the highest-
numbered ack-eliciting packet that has been received and there are
missing packets between that packet and this packet.
13.2.2. Acknowledgement Frequency
A receiver SHOULD send an ACK frame after receiving at least two
ack-eliciting packets.
13.2.4. Limiting Ranges by Tracking ACK Frames
When a packet containing an ACK frame is sent, the largest
acknowledged in that frame may be saved. When a packet containing an
ACK frame is acknowledged, the receiver can stop acknowledging
packets less than or equal to the largest acknowledged in the sent
ACK frame.
The history of acknowledged packet is kept in send context as ranges.
Up to NGX_QUIC_MAX_RANGES ranges is stored.
As a result, instead of separate ack frames, single frame with ranges
is sent.
Per draft-ietf-quic-transport-32 on the topic:
: Similarly, a server MUST expand the payload of all UDP datagrams carrying
: ack-eliciting Initial packets to at least the smallest allowed maximum
: datagram size of 1200 bytes.
The history of acknowledged packet is kept in send context as ranges.
Up to NGX_QUIC_MAX_RANGES ranges is stored.
As a result, instead of separate ack frames, single frame with ranges
is sent.
Previously, this field was not set while creating a QUIC stream connection.
As a result, calling ngx_connection_local_sockaddr() led to getsockname()
bad descriptor error.
If client acknowledged an Initial packet with CRYPTO frame and then
sent another Initial packet containing duplicate CRYPTO again, this
could result in resending frames off the empty send queue.
The new "quic_stateless_reset_token_key" directive is added. It sets the
endpoint key used to generate stateless reset tokens and enables feature.
If the endpoint receives short-header packet that can't be matched to
existing connection, a stateless reset packet is generated with
a proper token.
If a valid stateless reset token is found in the incoming packet,
the connection is closed.
Example configuration:
http {
quic_stateless_reset_token_key "foo";
...
}
The flag is tied to the initial secret creation. The presence of c->quic
pointer is sufficient to enable execution of ngx_quic_close_quic().
The ngx_quic_new_connection() function now returns the allocated quic
connection object and the c->quic pointer is set by the caller.
If an early error occurs before secrets initialization (i.e. in cases
of invalid retry token or nginx exiting), it is still possible to
generate an error response by trying to initialize secrets directly
in the ngx_quic_send_cc() function.
Before the change such early errors failed to send proper connection close
message and logged an error.
An auxilliary ngx_quic_init_secrets() function is introduced to avoid
verbose call to ngx_quic_set_initial_secret() requiring local variable.
All packet header parsing is now performed by ngx_quic_parse_packet()
function, located in the ngx_quic_transport.c file.
The packet processing is centralized in the ngx_quic_process_packet()
function which decides if the packet should be accepted, ignored or
connection should be closed, depending on the connection state.
As a result of refactoring, behavior has changed in some places:
- minimal size of Initial packet is now always tested
- connection IDs are always tested in existing connections
- old keys are discarded on encryption level switch
Now flags are processed in ngx_quic_input(), and raw->pos points to the first
byte after the flags. Redundant checks from ngx_quic_parse_short_header() and
ngx_quic_parse_long_header() are removed.
Previously, when a packet was declared lost, another packet was sent with the
same frames. Now lost frames are moved to the output frame queue and push
event is posted. This has the advantage of forming packets with more frames
than before.
Also, the start argument is removed from the ngx_quic_resend_frames()
function as excess information.
Previously the default server configuration context was used until the
:authority or host header was parsed. This led to using the configuration
parameters like client_header_buffer_size or request_pool_size from the default
server rather than from the server selected by SNI.
Also, the switch to the right server log is implemented. This issue manifested
itself as QUIC stream being logged to the default server log until :authority
or host is parsed.
Initially, client certificate verification didn't work due to the missing
hc->ssl on a QUIC stream, which is started to be set in 7738:7f0981be07c4.
Then it was lost in 7999:0d2b2664b41c introducing "quic" listen parameter.
This change re-adds hc->ssl back for all QUIC connections, similar to SSL.
As per HTTP/3 draft 30, section 7.2.8:
Frame types that were used in HTTP/2 where there is no corresponding
HTTP/3 frame have also been reserved (Section 11.2.1). These frame
types MUST NOT be sent, and their receipt MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED.
As per HTTP/3 draft 29, section 4.1:
Frames of unknown types (Section 9), including reserved frames
(Section 7.2.8) MAY be sent on a request or push stream before,
after, or interleaved with other frames described in this section.
Also, trailers frame is now used as an indication of the request body end.
While for HTTP/1 unexpected eof always means an error, for HTTP/3 an eof right
after a DATA frame end means the end of the request body. For this reason,
since adding HTTP/3 support, eof no longer produced an error right after recv()
but was passed to filters which would make a decision. This decision was made
in ngx_http_parse_chunked() and ngx_http_v3_parse_request_body() based on the
b->last_buf flag.
Now that since 0f7f1a509113 (1.19.2) rb->chunked->length is a lower threshold
for the expected number of bytes, it can be set to zero to indicate that more
bytes may or may not follow. Now it's possible to move the check for eof from
parser functions to ngx_http_request_body_chunked_filter() and clean up the
parsing code.
Also, in the default branch, in case of eof, the following three things
happened, which were replaced with returning NGX_ERROR while implementing
HTTP/3:
- "client prematurely closed connection" message was logged
- c->error flag was set
- NGX_HTTP_BAD_REQUEST was returned
The change brings back this behavior for HTTP/1 as well as HTTP/3.
If a packet sent in response to an initial client packet was lost, then
successive client initial packets were dropped by nginx with the unexpected
dcid message logged. This was because the new DCID generated by the server was
not available to the client.
The check tested the total size of a packet header and unprotected packet
payload, which doesn't include the packet number length and expansion of
the packet protection AEAD. If the packet was corrupted, it could cause
false triggering of the condition due to unsigned type underflow leading
to a connection error.
Existing checks for the QUIC header and protected packet payload lengths
should be enough.
From quic-tls draft, section 5.4.2:
An endpoint MUST discard packets that are not long enough to contain
a complete sample.
The check includes the Packet Number field assumed to be 4 bytes long.
During long packet header parsing, pkt->len is updated with the Length
field value that is used to find next coalesced packets in a datagram.
For short packets it still contained the whole QUIC packet size.
This change uniforms packet length handling to always contain the total
length of the packet number and protected packet payload in pkt->len.
Previously STOP_SENDING was sent to client upon stream closure if rev->eof and
rev->error were not set. This was an indirect indication that no RESET_STREAM
or STREAM fin has arrived. But it is indeed possible that rev->eof is not set,
but STREAM fin has already been received, just not read out by the application.
In this case sending STOP_SENDING does not make sense and can be misleading for
some clients.
The peer may issue additional connection IDs up to the limit defined by
transport parameter "active_connection_id_limit", using NEW_CONNECTION_ID
frames, and retire such IDs using RETIRE_CONNECTION_ID frame.
It is required to distinguish internal errors from corrupted packets and
perform actions accordingly: drop the packet or close the connection.
While there, made processing of ngx_quic_decrypt() erorrs similar and
removed couple of protocol violation errors.
quic-transport
5.2:
Packets that are matched to an existing connection are discarded if
the packets are inconsistent with the state of that connection.
5.2.2:
Servers MUST drop incoming packets under all other circumstances.
The removal of QUIC packet protection depends on the largest packet number
received. When a garbage packet was received, the decoder still updated the
largest packet number from that packet. This could affect removing protection
from subsequent QUIC packets.
As per HTTP/3 draft 29, section 4.1:
When the server does not need to receive the remainder of the request,
it MAY abort reading the request stream, send a complete response, and
cleanly close the sending part of the stream.
On QUIC connections, SSL_shutdown() is used to call the send_alert callback
to send a CONNECTION_CLOSE frame. The reverse side is handled by other means.
At least BoringSSL doesn't differentiate whether this is a QUIC SSL method,
so waiting for the peer's close_notify alert should be explicitly disabled.
The logical quic connection state is tested by handler functions that
process corresponding types of packets (initial/handshake/application).
The packet is declined if state is incorrect.
No timeout is required for the input queue.
If a client attemtps to start a new connection with unsupported version,
a version negotiation packet is sent that contains a list of supported
versions (currently this is a single version, selected at compile time).
The function ngx_http_upstream_check_broken_connection() terminates the HTTP/1
request if client sends eof. For QUIC (including HTTP/3) the c->write->error
flag is now checked instead. This flag is set when the entire QUIC connection
is closed or STOP_SENDING was received from client.
Previously the request body DATA frame header was read by one byte because
filters were called only when the requested number of bytes were read. Now,
after 08ff2e10ae92 (1.19.2), filters are called after each read. More bytes
can be read at once, which simplifies and optimizes the code.
This also reduces diff with the default branch.
Previously, such packets weren't handled as the resulting zero remaining time
prevented setting the loss detection timer, which, instead, could be disarmed.
For implementation details, see quic-recovery draft 29, appendix A.10.
The PTO handler is split into separate PTO and loss detection handlers
that operate interchangeably depending on which timer should be set.
The present ngx_quic_lost_handler is now only used for packet loss detection.
It replaces ngx_quic_pto_handler if there are packets preceeding largest_ack.
Once there is no more such packets, ngx_quic_pto_handler is installed again.
Probes carry unacknowledged data previously sent in the oldest packet number,
one per each packet number space. That is, it could be up to two probes.
PTO backoff is now increased before scheduling next probes.
In particular, this prevents declaring packet number 0 as lost if
there aren't yet any acknowledgements in this packet number space.
For example, only Initial packets were acknowledged in handshake.
Previously a single STREAM frame was created for each buffer in stream output
chain which is wasteful with respect to memory. The following changes were
made in the stream send code:
- ngx_quic_stream_send_chain() no longer calls ngx_quic_stream_send() and got
a separate implementation that coalesces neighbouring buffers into a single
frame
- the new ngx_quic_stream_send_chain() respects the limit argument, which fixes
sendfile_max_chunk and limit_rate
- ngx_quic_stream_send() is reimplemented to call ngx_quic_stream_send_chain()
- stream frame size limit is moved out to a separate function
ngx_quic_max_stream_frame()
- flow control is moved out to a separate function ngx_quic_max_stream_flow()
- ngx_quic_stream_send_chain() is relocated next to ngx_quic_stream_send()
Creating client-initiated streams is moved from ngx_quic_handle_stream_frame()
to a separate function ngx_quic_create_client_stream(). This function is
responsible for creating streams with lower ids as well.
Also, simplified and fixed initial data buffering in
ngx_quic_handle_stream_frame(). It is now done before calling the initial
handler as the handler can destroy the stream.
Previously this function generated an error trying to figure out if client shut
down the write end of the connection. The reason for this error was that a
QUIC stream has no socket descriptor. However checking for eof is not the
right thing to do for an HTTP/3 QUIC stream since HTTP/3 clients are expected
to shut down the write end of the stream after sending the request.
Now the function handles QUIC streams separately. It checks if c->read->error
is set. The error flags for c->read and c->write are now set for all streams
when closing the QUIC connection instead of setting the pending_eof flag.
According to quic-transport draft 29, section 19.3.1:
The value of the Gap field establishes the largest packet number
value for the subsequent ACK Range using the following formula:
largest = previous_smallest - gap - 2
Thus, given a largest packet number for the range, the smallest value
is determined by the formula:
smallest = largest - ack_range
While here, changed min/max to uint64_t for consistency.
A QUIC stream could be destroyed by handler while in ngx_quic_stream_input().
To detect this, ngx_quic_find_stream() is used to check that it still exists.
Previously, a stream id was passed to this routine off the frame structure.
In case of stream cleanup, it is freed along with other frames belonging to
the stream on cleanup. Then, a cleanup handler reuses last frames to update
MAX_STREAMS and serve other purpose. Thus, ngx_quic_find_stream() is passed
a reused frame with zeroed out part pointed by stream_id. If a stream with
id 0x0 still exists, this leads to use-after-free.
The limits on active bidi and uni client streams are maintained at their
initial values initial_max_streams_bidi and initial_max_streams_uni by sending
a MAX_STREAMS frame upon each client stream closure.
Also, the following is changed for data arriving to non-existing streams:
- if a stream was already closed, such data is ignored
- when creating a new stream, all streams of the same type with lower ids are
created too
The ngx_http_perl_module module doesn't have a notion of including additional
search paths through --with-cc-opt, which results in compile error incomplete
type 'enum ssl_encryption_level_t' when building nginx without QUIC support.
The enum is visible from quic event headers and eventually pollutes ngx_core.h.
The fix is to limit including headers to compile units that are real consumers.
According to quic-transport draft 29, section 21.12.1.1:
Prior to validation, endpoints are limited in what they are able to
send. During the handshake, a server cannot send more than three
times the data it receives; clients that initiate new connections or
migrate to a new network path are limited.
The ngx_quic_queue_frame() functions puts a frame into send queue and
schedules a push timer to actually send data.
The patch adds tracking for data amount in the queue and sends data
immediately if amount of data exceeds limit.
Instead of timer-based retransmissions with constant packet lifetime,
this patch implements ack-based loss detection and probe timeout
for the cases, when no ack is received, according to the quic-recovery
draft 29.
The c->quic->retransmit timer is now called "pto".
The ngx_quic_retransmit() function is renamed to "ngx_quic_detect_lost()".
This is a preparation for the following patches.
According to the quic-recovery 29, Section 5: Estimating the Round-Trip Time.
Currently, integer arithmetics is used, which loses sub-millisecond accuracy.
Previously, the expression (ch & 0x7f) was promoted to a signed integer.
Depending on the platform, the size of this integer could be less than 8 bytes,
leading to overflow when handling the higher bits of the result. Also, sign
bit of this integer could be replicated when adding to the 64-bit st->value.
Previously errors led only to closing streams.
To simplify closing QUIC connection from a QUIC stream context, new macro
ngx_http_v3_finalize_connection() is introduced. It calls
ngx_quic_finalize_connection() for the parent connection.
The function finalizes QUIC connection with an application protocol error
code and sends a CONNECTION_CLOSE frame with type=0x1d.
Also, renamed NGX_QUIC_FT_CONNECTION_CLOSE2 to NGX_QUIC_FT_CONNECTION_CLOSE_APP.
Previously dynamic table was not functional because of zero limit on its size
set by default. Now the following changes enable it:
- new directives to set SETTINGS_QPACK_MAX_TABLE_CAPACITY and
SETTINGS_QPACK_BLOCKED_STREAMS
- send settings with SETTINGS_QPACK_MAX_TABLE_CAPACITY and
SETTINGS_QPACK_BLOCKED_STREAMS to the client
- send Insert Count Increment to the client
- send Header Acknowledgement to the client
- evict old dynamic table entries on overflow
- decode Required Insert Count from client
- block stream if Required Insert Count is not reached
Client streams may send literal strings which are now limited in size by the
new directive. The default value is 4096.
The directive is similar to HTTP/2 directive http2_max_field_size.
So that connections are protected from failing from on-path attacks.
Decryption failure of long packets used during handshake still leads
to connection close since it barely makes sense to handle them there.
A previously used undefined error code is now replaced with the generic one.
Note that quic-transport prescribes keeping connection intact, discarding such
QUIC packets individually, in the sense that coalesced packets could be there.
This is selectively handled in the next change.
The patch removes remnants of the old state tracking mechanism, which did
not take into account assimetry of read/write states and was not very
useful.
The encryption state now is entirely tracked using SSL_quic_read/write_level().
quic-transport draft 29:
section 7:
* authenticated negotiation of an application protocol (TLS uses
ALPN [RFC7301] for this purpose)
...
Endpoints MUST explicitly negotiate an application protocol. This
avoids situations where there is a disagreement about the protocol
that is in use.
section 8.1:
When using ALPN, endpoints MUST immediately close a connection (see
Section 10.3 of [QUIC-TRANSPORT]) with a no_application_protocol TLS
alert (QUIC error code 0x178; see Section 4.10) if an application
protocol is not negotiated.
Changes in ngx_quic_close_quic() function are required to avoid attempts
to generated and send packets without proper keys, what happens in case
of failed ALPN check.
quic-transport draft 29, section 14:
QUIC depends upon a minimum IP packet size of at least 1280 bytes.
This is the IPv6 minimum size [RFC8200] and is also supported by most
modern IPv4 networks. Assuming the minimum IP header size, this
results in a QUIC maximum packet size of 1232 bytes for IPv6 and 1252
bytes for IPv4.
Since the packet size can change during connection lifetime, the
ngx_quic_max_udp_payload() function is introduced that currently
returns minimal allowed size, depending on address family.
quic-tls, 8.2:
The quic_transport_parameters extension is carried in the ClientHello
and the EncryptedExtensions messages during the handshake. Endpoints
MUST send the quic_transport_parameters extension; endpoints that
receive ClientHello or EncryptedExtensions messages without the
quic_transport_parameters extension MUST close the connection with an
error of type 0x16d (equivalent to a fatal TLS missing_extension
alert, see Section 4.10).
This is a temporary workaround, proper retransmission mechanism based on
quic-recovery rfc draft is yet to be implemented.
Currently hardcoded value is too small for real networks. The patch
sets static PTO, considering rtt of ~333ms, what gives about 1s.
Also, if both are present, require that they have the same value. These
requirements are specified in HTTP/3 draft 28.
Current implementation of HTTP/2 treats ":authority" and "Host"
interchangeably. New checks only make sure at least one of these values is
present in the request. A similar check existed earlier and was limited only
to HTTP/1.1 in 38c0898b6df7.
The flags was originally added by 8f038068f4bc, and is propagated correctly
in the stream module. With QUIC introduction, http module now uses datagram
sockets as well, thus the fix.
Preserving pointers within the client buffer is not needed for HTTP/3 because
all data is either allocated from pool or static. Unlike with HTTP/1, data
typically cannot be referenced directly within the client buffer. Trying to
preserve NULLs or external pointers lead to broken pointers.
Also, reverted changes in ngx_http_alloc_large_header_buffer() not relevant
for HTTP/3 to minimize diff to mainstream.
New field r->parse_start is introduced to substitute r->request_start and
r->header_name_start for request length accounting. These fields only work for
this purpose in HTTP/1 because HTTP/1 request line and header line start with
these values.
Also, error logging is now fixed to output the right part of the request.
As per HTTP/3 draft 27, a request or response containing uppercase header
field names MUST be treated as malformed. Also, existing rules applied
when parsing HTTP/1 header names are also applied to HTTP/3 header names:
- null character is not allowed
- underscore character may or may not be treated as invalid depending on the
value of "underscores_in_headers"
- all non-alphanumeric characters with the exception of '-' are treated as
invalid
Also, the r->locase_header field is now filled while parsing an HTTP/3
header.
Error logging for invalid headers is fixed as well.
The first one parses pseudo-headers and is analagous to the request line
parser in HTTP/1. The second one parses regular headers and is analogous to
the header parser in HTTP/1.
Additionally, error handling of client passing malformed uri is now fixed.
The function ngx_http_parse_chunked() is also called from the proxy module to
parse the upstream response. It should always parse HTTP/1 body in this case.
According to quic-transport draft 28 section 10.3.1:
When sending CONNECTION_CLOSE, the goal is to ensure that the peer
will process the frame. Generally, this means sending the frame in a
packet with the highest level of packet protection to avoid the
packet being discarded. After the handshake is confirmed (see
Section 4.1.2 of [QUIC-TLS]), an endpoint MUST send any
CONNECTION_CLOSE frames in a 1-RTT packet. However, prior to
confirming the handshake, it is possible that more advanced packet
protection keys are not available to the peer, so another
CONNECTION_CLOSE frame MAY be sent in a packet that uses a lower
packet protection level.
There is no need in a separate type for the QUIC connection state.
The only state not found in the SSL library is NGX_QUIC_ST_UNAVAILABLE,
which is actually a flag used by the ngx_quic_close_quic() function
to prevent cleanup of uninitialized connection.
Sections 4.10.1 and 4.10.2 of quic transport describe discarding of initial
and handshake keys. Since the keys are discarded, we no longer need
to retransmit packets and corresponding queues should be emptied.
This patch removes previously added workaround that did not require
acknowledgement for initial packets, resulting in avoiding retransmission,
which is wrong because a packet could be lost and we have to retransmit it.
It was possible that retransmit timer was not set after the first
retransmission attempt, due to ngx_quic_retransmit() did not set
wait time properly, and the condition in retransmit handler was incorrect.
Section 17.2 and 17.3 of QUIC transport:
Fixed bit: Packets containing a zero value for this bit are not
valid packets in this version and MUST be discarded.
Reserved bit: An endpoint MUST treat receipt of a packet that has
a non-zero value for these bits, after removing both packet and
header protection, as a connection error of type PROTOCOL_VIOLATION.
When an error occurs, then c->quic->error field may be populated
with an appropriate error code, and the CONNECTION CLOSE frame will be
sent to the peer before the connection is closed. Otherwise, the error
treated as internal and INTERNAL_ERROR code is sent.
The pkt->error field is populated by functions processing packets to
indicate an error when it does not fit into pass/fail return status.
As per QUIC transport, the first flight of 0-RTT packets obviously uses same
Destination and Source Connection ID values as the client's first Initial.
The fix is to match 0-RTT against original DCID after it has been switched.
The ordered frame handler is always called for the existing stream, as it is
allocated from this stream. Instead of searching stream by id, pointer to the
stream node is passed.
The idea is to skip any zeroes that follow valid QUIC packet. Currently such
behavior can be only observed with Firefox which sends zero-padded initial
packets.
Now there's no need to annotate every frame in ACK-eliciting packet.
Sending ACK was moved to the first place, so that queueing ACK frame
no longer postponed up to the next packet after pushing STREAM frames.
+ added "quic" prefix to all error messages
+ rephrased some messages
+ removed excessive error logging from frame parser
+ added ngx_quic_check_peer() function to check proper source/destination
match and do it one place
- the ngx_quic_hexdump0() macro is renamed to ngx_quic_hexdump();
the original ngx_quic_hexdump() macro with variable argument is
removed, extra information is logged normally, with ngx_log_debug()
- all labels in hex dumps are prefixed with "quic"
- the hexdump format is simplified, length is moved forward to avoid
situations when the dump is truncated, and length is not shown
- ngx_quic_flush_flight() function contents is debug-only, placed under
NGX_DEBUG macro to avoid "unused variable" warnings from compiler
- frame names in labels are capitalized, similar to other places
+ all dumps are moved under one of the following macros (undefined by default):
NGX_QUIC_DEBUG_PACKETS
NGX_QUIC_DEBUG_FRAMES
NGX_QUIC_DEBUG_FRAMES_ALLOC
NGX_QUIC_DEBUG_CRYPTO
+ all QUIC debug messages got "quic " prefix
+ all input frames are reported as "quic frame in FOO_FRAME bar:1 baz:2"
+ all outgoing frames re reported as "quic frame out foo bar baz"
+ all stream operations are prefixed with id, like: "quic stream id 0x33 recv"
+ all transport parameters are prefixed with "quic tp"
(hex dump is moved to caller, to avoid using ngx_cycle->log)
+ packet flags and some other debug messages are updated to
include packet type
We always generate stream frames that have length. The 'len' member is used
during parsing incoming frames and can be safely ignored when generating
output.
There are following flags in quic connection:
closing - true, when a connection close is initiated, for whatever reason
draining - true, when a CC frame is received from peer
The following state machine is used for closing:
+------------------+
| I/HS/AD |
+------------------+
| | |
| | V
| | immediate close initiated:
| | reasons: close by top-level protocol, fatal error
| | + sends CC (probably with app-level message)
| | + starts close_timer: 3 * PTO (current probe timeout)
| | |
| | V
| | +---------+ - Reply to input with CC (rate-limited)
| | | CLOSING | - Close/Reset all streams
| | +---------+
| | | |
| V V |
| receives CC |
| | |
idle | |
timer | |
| V |
| +----------+ | - MUST NOT send anything (MAY send a single CC)
| | DRAINING | | - if not already started, starts close_timer: 3 * PTO
| +----------+ | - if not already done, close all streams
| | |
| | |
| close_timer fires
| |
V V
+------------------------+
| CLOSED | - clean up all the resources, drop connection
+------------------------+ state completely
The ngx_quic_close_connection() function gets an "rc" argument, that signals
reason of connection closing:
NGX_OK - initiated by application (i.e. http/3), follow state machine
NGX_DONE - timedout (while idle or draining)
NGX_ERROR - fatal error, destroy connection immediately
The PTO calculations are not yet implemented, hardcoded value of 5s is used.
The function is split into three:
ngx_quic_close_connection() itself cleans up all core nginx things
ngx_quic_close_quic() deals with everything inside c->quic
ngx_quic_close_streams() deals with streams cleanup
The quic and streams cleanup functions may return NGX_AGAIN, thus signalling
that cleanup is not ready yet, and the close cannot continue to next step.
The header size macros for long and short packets were fixed to provide
correct values in bytes.
Currently the sending code limits frames so they don't exceed max_packet_size.
But it does not account the case when a single frame can exceed the limit.
As a result of this patch, big payload (CRYPTO and STREAM) will be split
into a number of smaller frames that fit into advertised max_packet_size
(which specifies final packet size, after encryption).
chrome-unstable 83.0.4103.7 starts with Initial packet number 1.
I couldn't find a proper explanation besides this text in quic-transport:
An endpoint MAY skip packet numbers when sending
packets to detect this (Optimistic ACK Attack) behavior.
+ MAX_STREAM_DATA frame is sent when recv() is performed on stream
The new value is a sum of total bytes received by stream + free
space in a buffer;
The sending of MAX_STREM_DATA frame in response to STREAM_DATA_BLOCKED
frame is adjusted to follow the same logic as above.
+ MAX_DATA frame is sent when total amount of received data is 2x
of current limit. The limit is doubled.
+ Default values of transport parameters are adjusted to more meaningful
values:
initial stream limits are set to quic buffer size instead of
unrealistically small 255.
initial max data is decreased to 16 buffer sizes, in an assumption that
this is enough for a relatively short connection, instead of randomly
chosen big number.
All this allows to initiate a stable flow of streams that does not block
on stream/connection limits (tested with FF 77.0a1 and 100K requests)
Before the patch, full STREAM frame handling was delayed until the frame with
zero offset is received. Only node in the streams tree was created.
This lead to problems when such stream was deleted, in particular, it had no
handlers set for read events.
This patch creates new stream immediately, but delays data delivery until
the proper offset will arrive. This is somewhat similar to how accept()
operation works.
The ngx_quic_add_stream() function is no longer needed and merged into stream
handler. The ngx_quic_stream_input() now only handles frames for existing
streams and does not deal with stream creation.
Frames can still float in the following queues:
- crypto frames reordering queues (one per encryption level)
- moved crypto frames cleanup to the moment where all streams are closed
- stream frames reordering queues (one per packet number namespace)
- frames retransmit queues (one per packet number namespace)
Each stream node now includes incoming frames queue and sent/received counters
for tracking offset. The sent counter is not used, c->sent is used, not like
in crypto buffers, which have no connections.
If offset in CRYPTO frame doesn't match expected, following actions are taken:
a) Duplicate frames or frames within [0...current offset] are ignored
b) New data from intersecting ranges (starts before current_offset, ends
after) is consumed
c) "Future" frames are stored in a sorted queue (min offset .. max offset)
Once a frame is consumed, current offset is updated and the queue is inspected:
we iterate the queue until the gap is found and act as described
above for each frame.
The amount of data in buffered frames is limited by corresponding macro.
The CRYPTO and STREAM frame structures are now compatible: they share
the same set of initial fields. This allows to have code that deals with
both of this frames.
The ordering layer now processes the frame with offset and invokes the
handler when it can organise an ordered stream of data.
Quote: Conceptually, a packet number space is the context in which a packet
can be processed and acknowledged.
ngx_quic_namespace_t => ngx_quic_send_ctx_t
qc->ns => qc->send_ctx
ns->largest => send_ctx->largest_ack
The ngx_quic_ns(level) macro now returns pointer, not just index:
ngx_quic_get_send_ctx(c->quic, level)
ngx_quic_retransmit_ns() => ngx_quic_retransmit()
ngx_quic_output_ns() => ngx_quic_output_frames()
The offset in client CRYPTO frames is tracked in c->quic->crypto_offset_in.
This means that CRYPTO frames with non-zero offset are now accepted making
possible to finish handshake with client certificates that exceed max packet
size (if no reordering happens).
The c->quic->crypto_offset field is renamed to crypto_offset_out to avoid
confusion with tracking of incoming CRYPTO stream.
+ since number of ranges in unknown, provide a function to parse them once
again in handler to avoid memory allocation
+ ack handler now processes all ranges, not only the first
+ ECN counters are parsed and saved into frame if present
Such frames are grouped together in a switch and just ignored, instead of
closing the connection This may improve test coverage. All such frames
require acknowledgment.
The qc->closing flag is set when a connection close is initiated for the first
time.
No timers will be set if the flag is active.
TODO: this is a temporary solution to avoid running timer handlers after
connection (and it's pool) was destroyed. It looks like currently we have
no clear policy of connection closing in regard to timers.
Found with a previously received Initial packet with ACK only, which
instantiates a new connection but do not produce the handshake keys.
This can be triggered by a fairly well behaving client, if the server
stands behind a load balancer that stripped Initial packets exchange.
Found by F5 test suite.
This makes sending large number of bidirectional stream work within ngtcp2,
which doesn't bother sending optional STREAMS_BLOCKED when exhausted.
This also introduces tracking currently opened and maximum allowed streams.
Currently, the output is called periodically, each 200 ms to invoke
ngx_quic_output() that will push all pending frames into packets.
TODO: implement flags a-là Nagle & co (NO_DELAY/NO_PUSH...)
All frames collected to packet are moved into a per-namespace send queue.
QUIC connection has a timer which fires on the closest max_ack_delay time.
The frame is deleted from the queue when a corresponding packet is acknowledged.
The NGX_QUIC_MAX_RETRANSMISSION is a timeout that defines maximum length
of retransmission of a frame.
The quic->keys[4] array now contains secrets related to the corresponding
encryption level. All protection-level functions get proper keys and do
not need to switch manually between levels.
If early data is accepted, SSL_do_handshake() completes as soon as ClientHello
is processed. SSL_in_init() will report the handshake is still in progress.
Static buffers are used instead in functions where decryption takes place.
The pkt->plaintext points to the beginning of a static buffer.
The pkt->payload.data points to decrypted data actual start.
+ ngx_quic_encrypt():
- no longer accepts pool as argument
- pkt is 1st arg
- payload is passed as pkt->payload
- performs encryption to the specified static buffer
+ ngx_quic_create_long/short_packet() functions:
- single buffer for everything, allocated by caller
- buffer layout is: [ ad | payload | TAG ]
the result is in the beginning of buffer with proper length
- nonce is calculated on stack
- log is passed explicitly, pkt is 1st arg
- no more allocations inside
+ ngx_quic_create_long_header():
- args changed: no need to pass str_t
+ added ngx_quic_create_short_header()
+ Client-related errors (i.e. parsing) are done at INFO level
+ c->log->action is updated through the process of receiving, parsing.
handling packet/payload and generating frames/output.
For ngx_http_process_request() part to work, this required to set both
r->http_connection->ssl and c->ssl on a QUIC stream. To avoid damaging
global SSL object, ngx_ssl_shutdown() is managed to ignore QUIC streams.
+ ngx_quic_init_ssl_methods() is no longer there, we setup methods on SSL
connection directly.
+ the handshake_handler is actually a generic quic input handler
+ updated c->log->action and debug to reflect changes and be more informative
+ c->quic is always set in ngx_quic_input()
+ the quic connection state is set by the results of SSL_do_handshake();
note:
+ parameters are available in SSL connection since they are obtained by ssl
stack
quote:
During connection establishment, both endpoints make authenticated
declarations of their transport parameters. These declarations are
made unilaterally by each endpoint.
and really, we send our parameters before we read client's.
no handling of incoming parameters is made by this patch.
- integer parameters can be configured using the following directives:
quic_max_idle_timeout
quic_max_ack_delay
quic_max_packet_size
quic_initial_max_data
quic_initial_max_stream_data_bidi_local
quic_initial_max_stream_data_bidi_remote
quic_initial_max_stream_data_uni
quic_initial_max_streams_bidi
quic_initial_max_streams_uni
quic_ack_delay_exponent
quic_active_migration
quic_active_connection_id_limit
- only following parameters are actually sent:
active_connection_id_limit
initial_max_streams_uni
initial_max_streams_bidi
initial_max_stream_data_bidi_local
initial_max_stream_data_bidi_remote
initial_max_stream_data_uni
(other parameters are to be added into ngx_quic_create_transport_params()
function as needed, should be easy now)
- draft 24 and draft 27 are now supported
(at compile-time using quic_version macro)
The ngx_quic_parse_frame() functions now has new 'pkt' argument: the packet
header of a currently processed frame. This allows to log errors/debug
closer to reasons and perform additional checks regarding possible frame
types. The handler only performs processing of good frames.
A number of functions like read_uint32(), parse_int[_multi] probably should
be implemented as a macro, but currently it is better to have them as
functions for simpler debugging.
Cleanup in ngx_event_quic.c:
+ reorderded functions, structures
+ added missing prototypes
+ added separate handlers for each frame type
+ numerous indentation/comments/TODO fixes
+ removed non-implemented qc->state and corresponding enum;
this requires deep thinking, stub was unused.
+ streams inside quic connection are now in own structure
All code dealing with serializing/deserializing
is moved int srv/event/ngx_event_quic_transport.c/h file.
All macros for dealing with data are internal to source file.
The header file exposes frame types and error codes.
The exported functions are currently packet header parsers and writers
and frames parser/writer.
The ngx_quic_header_t structure is updated with 'log' member. This avoids
passing extra argument to parsing functions that need to report errors.
+ support for more than one initial packet
+ workaround for trailing zeroes in packet
+ ignore application data packet if no keys yet (issue in draft 27/ff nightly)
+ fixed PING frame parser
+ STREAM frames need to be acknowledged
The following HTTP configuration is used for firefox (v74):
http {
ssl_certificate_key localhost.key;
ssl_certificate localhost.crt;
ssl_protocols TLSv1.2 TLSv1.3;
server {
listen 127.0.0.1:10368 reuseport http3;
ssl_quic on;
server_name localhost;
location / {
return 200 "This-is-QUICK\n";
}
}
server {
listen 127.0.0.1:5555 ssl; # point the browser here
server_name localhost;
location / {
add_header Alt-Svc 'h3-24=":10368";ma=100';
return 200 "ALT-SVC";
}
}
}
New files:
src/event/ngx_event_quic_protection.h
src/event/ngx_event_quic_protection.c
The protection.h header provides interface to the crypto part of the QUIC:
2 functions to initialize corresponding secrets:
ngx_quic_set_initial_secret()
ngx_quic_set_encryption_secret()
and 2 functions to deal with packet processing:
ngx_quic_encrypt()
ngx_quic_decrypt()
Also, structures representing secrets are defined there.
All functions require SSL connection and a pool, only crypto operations
inside, no access to nginx connections or events.
Currently pool->log is used for the logging (instead of original c->log).
- events handling moved into src/event/ngx_event_quic.c
- http invokes once ngx_quic_run() and passes stream callback
(diff to original http_request.c is now minimal)
- streams are stored in rbtree using ID as a key
- when a new stream is registered, appropriate callback is called
- ngx_quic_stream_t type represents STREAM and stored in c->qs
- now NEW_CONNECTION_ID frames can be received and parsed
The packet structure is created in ngx_quic_input() and passed
to all handlers (initial, handshake and application data).
The UDP datagram buffer is saved as pkt->raw;
The QUIC packet is stored as pkt->data and pkt->len (instead of pkt->buf)
(pkt->len is adjusted after parsing headers to actual length)
The pkt->pos is removed, pkt->raw->pos is used instead.
- added basic parsing of ACK, PING and PADDING frames on input
- added preliminary parsing of SHORT headers
The ngx_quic_output() is now called after processing of each input packet.
Frames are added into output queue according to their level: inital packets
go ahead of handshake and application data, so they can be merged properly.
The payload handler is called from both new, handshake and applicataion data
handlers (latter is a stub).
As was objserved with ngtcp2 client, Finished CRYPTO frame within Handshake
packet may not be sent for some reason if there's nothing to append on 1-RTT.
This results in unnecessary retransmit. To avoid this edge case, a non-zero
active_connection_id_limit transport parameter is now used to append datagram
with NEW_CONNECTION_ID 1-RTT frames.
Now handshake generates frames, and they are queued in c->quic->frames.
The ngx_quic_output() is called from ngx_quic_flush_flight() or manually,
processes the queue and encrypts all frames according to required encryption
level.
ngx_quic_hexdump0(log, format, buffer, buffer_size);
- logs hexdump of buffer to specified error log
ngx_quic_hexdump0(c->log, "this is foo:", foo.data, foo.len);
ngx_quic_hexdump(log, format, buffer, buffer_size, ...)
- same as hexdump0, but more format/args possible:
ngx_quic_hexdump(c->log, "a=%d b=%d, foo is:", foo.data, foo.len, a, b);
Introduced ngx_quic_input() and ngx_quic_output() as interface between
nginx and protocol. They are the only functions that are exported.
While there, added copyrights.
For NGINX troubleshooting/technical help, please visit our community forum instead of asking your questions here. We will politely redirect these types of questions to the forum.
- type:textarea
id:general
attributes:
label:What would you like to discuss?
description:Please provide as much context as possible. Remember that only general discussions related to the NGINX codebase will be addressed on GitHub. For anything else, please visit our [community forum](https://community.nginx.org/).
For NGINX troubleshooting/technical help, please visit our community forum instead of asking your questions here. We will politely redirect these types of questions to the forum.
- type:textarea
id:ideas
attributes:
label:What idea would you like to discuss?
description:Please provide as much context as possible. Remember that only ideas related to the NGINX codebase will be addressed on GitHub. For anything else, please visit our [community forum](https://community.nginx.org/).
For NGINX troubleshooting/technical help, please visit our community forum instead of asking your questions here. We will politely redirect these types of questions to the forum.
- type:textarea
id:q-a
attributes:
label:What question do you have?
description:Please provide as much context as possible. Remember that only questions related to the NGINX codebase will be addressed on GitHub. For anything else, please visit our [community forum](https://community.nginx.org/).
Thanks for taking the time to fill out this bug report!
Before you continue filling out this report, please take a moment to check that your bug has not been [already reported on GitHub][issue search], is reproducible with the latest version of nginx, and does not involve any third-party modules 🙌
Remember to redact any sensitive information such as authentication credentials and/or license keys!
**Note:**If you are seeking community support, please start a new topic in the [NGINX Community forum][forum]. If you wish to discuss the codebase, please start a new thread via [GitHub discussions][discussions].
Thanks for taking the time to fill out this feature request!
Before you continue filling out this request, please take a moment to check that your feature has not been [already requested on GitHub][issue search] 🙌
**Note:**If you are seeking community support, please start a new topic in the [NGINX Community forum][forum]. If you wish to discuss the codebase, please start a new thread via [GitHub discussions][discussions].
Describe the use case and detail of the change. If this PR addresses an issue on GitHub, make sure to include a link to that issue using one of the [supported keywords](https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue) in this PR's description or commit message.
### Checklist
Before creating a PR, run through this checklist and mark each as complete:
- [ ] I have read the [contributing guidelines](/CONTRIBUTING.md).
- [ ] I have checked that NGINX compiles and runs after adding my changes.
if:(github.event.comment.body == 'recheck' || github.event.comment.body == 'I have hereby read the F5 CLA and agree to its terms') || github.event_name == 'pull_request_target'
custom-notsigned-prcomment:'🎉 Thank you for your contribution! It appears you have not yet signed the [F5 Contributor License Agreement (CLA)](https://github.com/f5/f5-cla/blob/main/docs/f5_cla.md), which is required for your changes to be incorporated into an F5 Open Source Software (OSS) project. Please kindly read the [F5 CLA](https://github.com/f5/f5-cla/blob/main/docs/f5_cla.md) and reply on a new comment with the following text to agree:'
custom-pr-sign-comment:'I have hereby read the F5 CLA and agree to its terms'
custom-allsigned-prcomment:'✅ All required contributors have signed the F5 CLA for this PR. Thank you!'
# Remote repository storing CLA signatures.
remote-organization-name:f5
remote-repository-name:f5-cla-data
# Branch where CLA signatures are stored.
branch:main
path-to-signatures:signatures/signatures.json
# Comma separated list of usernames for maintainers or any other individuals who should not be prompted for a CLA.
# NOTE: You will want to edit the usernames to suit your project needs.
[](https://www.repostatus.org/#active)
[](/CODE_OF_CONDUCT.md)
NGINX (pronounced "engine x" or "en-jin-eks") is the world's most popular Web Server, high performance Load Balancer, Reverse Proxy, API Gateway and Content Cache.
NGINX is free and open source software, distributed under the terms of a simplified [2-clause BSD-like license](LICENSE).
Enterprise distributions, commercial support and training are available from [F5, Inc](https://www.f5.com/products/nginx).
> [!IMPORTANT]
> The goal of this README is to provide a basic, structured introduction to NGINX for novice users. Please refer to the [full NGINX documentation](https://nginx.org/en/docs/) for detailed information on [installing](https://nginx.org/en/docs/install.html), [building](https://nginx.org/en/docs/configure.html), [configuring](https://nginx.org/en/docs/dirindex.html), [debugging](https://nginx.org/en/docs/debugging_log.html), and more. These documentation pages also contain a more detailed [Beginners Guide](https://nginx.org/en/docs/beginners_guide.html), How-Tos, [Development guide](https://nginx.org/en/docs/dev/development_guide.html), and a complete module and [directive reference](https://nginx.org/en/docs/dirindex.html).
# Table of contents
- [How it works](#how-it-works)
- [Modules](#modules)
- [Configurations](#configurations)
- [Runtime](#runtime)
- [Downloading and installing](#downloading-and-installing)
- [Stable and Mainline binaries](#stable-and-mainline-binaries)
- [Cloning the NGINX GitHub repository](#cloning-the-nginx-github-repository)
- [Configuring the build](#configuring-the-build)
- [Compiling](#compiling)
- [Location of binary and installation](#location-of-binary-and-installation)
- [Running and testing the installed binary](#running-and-testing-the-installed-binary)
- [Asking questions and reporting issues](#asking-questions-and-reporting-issues)
- [Contributing code](#contributing-code)
- [Additional help and resources](#additional-help-and-resources)
- [Changelog](#changelog)
- [License](#license)
# How it works
NGINX is installed software with binary packages available for all major operating systems and Linux distributions. See [Tested OS and Platforms](https://nginx.org/en/#tested_os_and_platforms) for a full list of compatible systems.
> [!IMPORTANT]
> While nearly all popular Linux-based operating systems are distributed with a community version of nginx, we highly advise installation and usage of official [packages](https://nginx.org/en/linux_packages.html) or sources from this repository. Doing so ensures that you're using the most recent release or source code, including the latest feature-set, fixes and security patches.
## Modules
NGINX is comprised of individual modules, each extending core functionality by providing additional, configurable features. See "Modules reference" at the bottom of [nginx documentation](https://nginx.org/en/docs/) for a complete list of official modules.
NGINX modules can be built and distributed as static or dynamic modules. Static modules are defined at build-time, compiled, and distributed in the resulting binaries. See [Dynamic Modules](#dynamic-modules) for more information on how they work, as well as, how to obtain, install, and configure them.
> [!TIP]
> You can issue the following command to see which static modules your NGINX binaries were built with:
```bash
nginx -V
```
> See [Configuring the build](#configuring-the-build) for information on how to include specific Static modules into your nginx build.
## Configurations
NGINX is highly flexible and configurable. Provisioning the software is achieved via text-based config file(s) accepting parameters called "[Directives](https://nginx.org/en/docs/dirindex.html)". See [Configuration File's Structure](https://nginx.org/en/docs/beginners_guide.html#conf_structure) for a comprehensive description of how NGINX configuration files work.
> [!NOTE]
> The set of directives available to your distribution of NGINX is dependent on which [modules](#modules) have been made available to it.
## Runtime
Rather than running in a single, monolithic process, NGINX is architected to scale beyond Operating System process limitations by operating as a collection of processes. They include:
- A "master" process that maintains worker processes, as well as, reads and evaluates configuration files.
- One or more "worker" processes that process data (eg. HTTP requests).
The number of [worker processes](https://nginx.org/en/docs/ngx_core_module.html#worker_processes) is defined in the configuration file and may be fixed for a given configuration or automatically adjusted to the number of available CPU cores. In most cases, the latter option optimally balances load across available system resources, as NGINX is designed to efficiently distribute work across all worker processes.
> [!TIP]
> Processes synchronize data through shared memory. For this reason, many NGINX directives require the allocation of shared memory zones. As an example, when configuring [rate limiting](https://nginx.org/en/docs/http/ngx_http_limit_req_module.html#limit_req), connecting clients may need to be tracked in a [common memory zone](https://nginx.org/en/docs/http/ngx_http_limit_req_module.html#limit_req_zone) so all worker processes can know how many times a particular client has accessed the server in a span of time.
# Downloading and installing
Follow these steps to download and install precompiled NGINX binaries. You may also choose to [build NGINX locally from source code](#building-from-source).
## Stable and Mainline binaries
NGINX binaries are built and distributed in two versions: stable and mainline. Stable binaries are built from stable branches and only contain critical fixes backported from the mainline version. Mainline binaries are built from the [master branch](https://github.com/nginx/nginx/tree/master) and contain the latest features and bugfixes. You'll need to [decide which is appropriate for your purposes](https://docs.nginx.com/nginx/admin-guide/installing-nginx/installing-nginx-open-source/#choosing-between-a-stable-or-a-mainline-version).
## Linux binary installation process
The NGINX binary installation process takes advantage of package managers native to specific Linux distributions. For this reason, first-time installations involve adding the official NGINX package repository to your system's package manager. Follow [these steps](https://nginx.org/en/linux_packages.html) to download, verify, and install NGINX binaries using the package manager appropriate for your Linux distribution.
### Upgrades
Future upgrades to the latest version can be managed using the same package manager without the need to manually download and verify binaries.
## FreeBSD installation process
For more information on installing NGINX on FreeBSD system, visit https://nginx.org/en/docs/install.html
## Windows executables
Windows executables for mainline and stable releases can be found on the main [NGINX download page](https://nginx.org/en/download.html). Note that the current implementation of NGINX for Windows is at the Proof-of-Concept stage and should only be used for development and testing purposes. For additional information, please see [nginx for Windows](https://nginx.org/en/docs/windows.html).
## Dynamic modules
NGINX version 1.9.11 added support for [Dynamic Modules](https://nginx.org/en/docs/ngx_core_module.html#load_module). Unlike Static modules, dynamically built modules can be downloaded, installed, and configured after the core NGINX binaries have been built. [Official dynamic module binaries](https://nginx.org/en/linux_packages.html#dynmodules) are available from the same package repository as the core NGINX binaries described in previous steps.
> [!TIP]
> [NGINX JavaScript (njs)](https://github.com/nginx/njs), is a popular NGINX dynamic module that enables the extension of core NGINX functionality using familiar JavaScript syntax.
> [!IMPORTANT]
> If desired, dynamic modules can also be built statically into NGINX at compile time.
# Getting started with NGINX
For a gentle introduction to NGINX basics, please see our [Beginner’s Guide](https://nginx.org/en/docs/beginners_guide.html).
## Installing SSL certificates and enabling TLS encryption
See [Configuring HTTPS servers](https://nginx.org/en/docs/http/configuring_https_servers.html) for a quick guide on how to enable secure traffic to your NGINX installation.
## Load Balancing
For a quick start guide on configuring NGINX as a Load Balancer, please see [Using nginx as HTTP load balancer](https://nginx.org/en/docs/http/load_balancing.html).
## Rate limiting
See our [Rate Limiting with NGINX](https://blog.nginx.org/blog/rate-limiting-nginx) blog post for an overview of core concepts for provisioning NGINX as an API Gateway.
## Content caching
See [A Guide to Caching with NGINX and NGINX Plus](https://blog.nginx.org/blog/nginx-caching-guide) blog post for an overview of how to use NGINX as a content cache (e.g. edge server of a content delivery network).
# Building from source
The following steps can be used to build NGINX from source code available in this repository.
## Installing dependencies
Most Linux distributions will require several dependencies to be installed in order to build NGINX. The following instructions are specific to the `apt` package manager, widely available on most Ubuntu/Debian distributions and their derivatives.
> [!TIP]
> It is always a good idea to update your package repository lists prior to installing new packages.
> ```bash
> sudo apt update
> ```
### Installing compiler and make utility
Use the following command to install the GNU C compiler and Make utility.
```bash
sudo apt install gcc make
```
### Installing dependency libraries
```bash
sudo apt install libpcre3-dev zlib1g-dev
```
> [!WARNING]
> This is the minimal set of dependency libraries needed to build NGINX with rewriting and gzip capabilities. Other dependencies may be required if you choose to build NGINX with additional modules. Monitor the output of the `configure` command discussed in the following sections for information on which modules may be missing. For example, if you plan to use SSL certificates to encrypt traffic with TLS, you'll need to install the OpenSSL library. To do so, issue the following command.
>```bash
>sudo apt install libssl-dev
## Cloning the NGINX GitHub repository
Using your preferred method, clone the NGINX repository into your development directory. See [Cloning a GitHub Repository](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository) for additional help.
```bash
git clone https://github.com/nginx/nginx.git
```
## Configuring the build
Prior to building NGINX, you must run the `configure` script with [appropriate flags](https://nginx.org/en/docs/configure.html). This will generate a Makefile in your NGINX source root directory that can then be used to compile NGINX with [options specified during configuration](https://nginx.org/en/docs/configure.html).
From the NGINX source code repository's root directory:
```bash
auto/configure
```
> [!IMPORTANT]
> Configuring the build without any flags will compile NGINX with the default set of options. Please refer to https://nginx.org/en/docs/configure.html for a full list of available build configuration options.
## Compiling
The `configure` script will generate a `Makefile` in the NGINX source root directory upon successful execution. To compile NGINX into a binary, issue the following command from that same directory:
```bash
make
```
## Location of binary and installation
After successful compilation, a binary will be generated at `<NGINX_SRC_ROOT_DIR>/objs/nginx`. To install this binary, issue the following command from the source root directory:
```bash
sudo make install
```
> [!IMPORTANT]
> The binary will be installed into the `/usr/local/nginx/` directory.
## Running and testing the installed binary
To run the installed binary, issue the following command:
```bash
sudo /usr/local/nginx/sbin/nginx
```
You may test NGINX operation using `curl`.
```bash
curl localhost
```
The output of which should start with:
```html
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
```
# Asking questions and reporting issues
See our [Support](SUPPORT.md) guidelines for information on how discuss the codebase, ask troubleshooting questions, and report issues.
# Contributing code
Please see the [Contributing](CONTRIBUTING.md) guide for information on how to contribute code.
# Additional help and resources
- See the [NGINX Community Blog](https://blog.nginx.org/) for more tips, tricks and HOW-TOs related to NGINX and related projects.
- Access [nginx.org](https://nginx.org/), your go-to source for all documentation, information and software related to the NGINX suite of projects.
# Changelog
See our [changelog](https://nginx.org/en/CHANGES) to keep track of updates.
# License
[2-clause BSD-like license](LICENSE)
---
Additional documentation available at: https://nginx.org/en/docs
This document provides an overview of security concerns related to nginx
deployments, focusing on confidentiality, integrity, availability, and the
implications of configurations and misconfigurations.
## Reporting a Vulnerability
Please report any vulnerabilities via one of the following methods
(in order of preference):
1. [Report a vulnerability](https://docs.github.com/en/code-security/security-advisories/guidance-on-reporting-and-writing-information-about-vulnerabilities/privately-reporting-a-security-vulnerability)
within this repository. We are using the GitHub workflow that allows us to
manage vulnerabilities in a private manner and interact with reporters
securely.
2. [Report directly to F5](https://www.f5.com/services/support/report-a-vulnerability).
3. Report via email to security-alert@nginx.org.
This method will be deprecated in the future.
### Vulnerability Disclosure and Fix Process
The nginx team expects that all suspected vulnerabilities be reported
privately via the
[Reporting a Vulnerability](SECURITY.md#reporting-a-vulnerability) guidelines.
If a publicly released vulnerability is reported, we
may request to handle it according to the private disclosure process.
If the reporter agrees, we will follow the private disclosure process.
Security fixes will be applied to all supported stable releases, as well
as the mainline version, as applicable. We recommend using the most recent
mainline or stable release of nginx. Fixes are created and tested by the core
team using a GitHub private fork for security. If necessary, the reporter
may be invited to contribute to the fork and assist with the solution.
The nginx team is committed to responsible information disclosure with
sufficient detail, such as the CVSS score and vector. Privately disclosed
vulnerabilities are embargoed by default until the fix is released.
Communications and fixes remain private until made public. As nginx is