When using OpenSSL 3.5, the crypto_release_rcd QUIC callback can be
called late, after the QUIC connection was already closed on handshake
failure, resulting in a segmentation fault. For instance, it happened
if a client Finished message didn't align with a record boundary.
This commit is prepared for HTTP/2 and HTTP/3 support.
The ALPN protocol is now set per-connection in
ngx_http_upstream_ssl_init_connection(), allowing proper protocol negotiation
for each individual upstream connection regardless of SSL context sharing.
Chunked transfer encoding, since originally introduced in HTTP/1.1
in RFC 2068, is specified to use CRLF as the only line terminator.
Although tolerant applications may recognize a single LF, formally
this covers the start line and fields, and doesn't apply to chunks.
Strict chunked parsing is reaffirmed as intentional in RFC errata
ID 7633, notably "because it does not have to retain backwards
compatibility with 1.0 parsers".
A general RFC 2616 recommendation to tolerate deviations whenever
interpreted unambiguously doesn't apply here, because chunked body
is used to determine HTTP message framing; a relaxed parsing may
cause various security problems due to a broken delimitation.
For instance, this is possible when receiving chunked body from
intermediates that blindly parse chunk-ext or a trailer section
until CRLF, and pass it further without re-coding.
- Issue templates are replaced with forms. Forms allow to explicitly ask
for certain info before an issue is opened, they can be programmatically
queried via GH actions to get the data in fields.
- Added language around GH discussions vs the forum in the issue forms.
- Added GH discussions templates. These templates delineate which types
of discussions belong on GitHub vs the community forum.
- Created SUPPORT.md to delineate which types of topics belong on GitHub
vs different support channels (community forum/docs/commercial support).
- Updated CONTRIBUTING.md:
- Removed text that belongs in SUPPORT.md.
- Added F5 CLA clarifying text.
- Added badges to README.md. Most of these are there to make information
even clearer, moreso for users reading README.md from sources outside
GitHub.
If request URI was shorter than location prefix, as after replacement
with try_files, location length was used to copy the remaining URI part
leading to buffer overread.
The fix is to replace full request URI in this case. In the following
configuration, request "/123" is changed to "/" when sent to backend.
location /1234 {
try_files /123 =404;
proxy_pass http://127.0.0.1:8080/;
}
Closes#983 on GitHub.
This allows to process a port subcomponent and save it in r->port
in a unified way, similar to r->headers_in.server. For HTTP/1.x
request line in the absolute form, r->host_end now includes a port
subcomponent, which is also consistent with HTTP/2 and HTTP/3.
Validation is rewritten to follow RFC 3986 host syntax, based on
ngx_http_parse_request_line(). The following is now rejected:
- the rest of gen-delims "#", "?", "@", "[", "]"
- other unwise delims <">, "<", ">", "\", "^", "`', "{", "|", "}"
- IP literals with a trailing dot, missing closing bracket, or pct-encoded
- a port subcomponent with invalid values
- characters in upper half
In addition to moving memcpy() under the length condition in 15bf6d8cc,
which addressed a reported UB due to string function conventions, this
is repeated for advancing an input buffer, to make the resulting code
more clean and readable.
Additionally, although considered harmless for both string functions and
additive operators, as previously discussed in GitHub PR 866, this fixes
the main source of annoying sanitizer reports in the module.
Prodded by UndefinedBehaviorSanitizer (pointer-overflow).
The function interface is changed to follow a common approach
to other functions used to setup SSL_CTX, with an exception of
"ngx_conf_t *cf" since it is not bound to nginx configuration.
This is required to report and propagate SSL_CTX_set_ex_data()
errors, as reminded by Coverity (CID 1668589).
For certain compilers we embed the compiler version used to build nginx
in the binary, retrievable via 'nginx -V', e.g.
$ ./objs/nginx -V
...
built by gcc 15.2.1 20250808 (Red Hat 15.2.1-1) (GCC)
...
However if the CFLAGS environment variable is set this would be omitted.
This is due to the compiler specific auto/cc files not being run when
the CFLAGS environment variable is set, this is so entities can set
their own compiler flags, and thus the NGX_COMPILER variable isn't set.
Nonetheless it is a useful thing to have so re-work the auto scripts to
move the version gathering out of the individual auto/cc/$NGX_CC_NAME
files and merge them into auto/cc/name.
Link: <https://github.com/nginx/nginx/issues/878>
The new directives add_header_inherit and add_trailer_inherit allow
to alter inheritance rules for the values specified in the add_header
and add_trailer directives in a convenient way.
The "merge" parameter enables appending the values from the previous level
to the current level values.
The "off" parameter cancels inheritance of the values from the previous
configuration level, similar to add_header "" (2194e75bb).
The "on" parameter (default) enables the standard inheritance behaviour,
which is to inherit values from the previous level only if there are no
directives on the current level.
The inheritance rules themselves are inherited in a standard way. Thus,
for example, "add_header_inherit merge;" specified at the top level will
be inherited in all nested levels recursively unless redefined below.
Variables contain the IANA name of the signature scheme[1] used to sign
the TLS handshake.
Variables are only meaningful when using OpenSSL 3.5 and above, with older
versions they are empty. Moreover, since this data isn't stored in a
serialized session, variables are only available for new sessions.
[1] https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml
Requested by willmafh.
After f10bc5a763 the address was set to NULL only when local address was
not specified at all. In case complex value evaluated to an empty or
invalid string, local address remained unchanged. Currenrly this is not
a problem since the value is only set once. This change is a preparation
for being able to change the local address after initial setting.
The change allows modules to use the CONNECT method with HTTP/1.1 requests.
To do so, they need to set the "allow_connect" flag in the core server
configuration.
The $request_port variable contains the port passed by the client in the
request line (for HTTP/1.x) or ":authority" pseudo-header (for HTTP/2 and
HTTP/3). If the request line contains no host, or ":authority" is missing,
then $request_port is taken from the "Host" header, similar to the $host
variable.
The $is_request_port variable contains ":" if $request_port is non-empty,
and is empty otherwise.
BoringSSL/AWS-LC provide two callbacks for each compression algorithm,
which may be used to compress and decompress certificates in runtime.
This change implements compression support with zlib, as enabled with
the ssl_certificate_compression directive. Compressed certificates
are stored in certificate exdata and reused in subsequent connections.
Notably, AWS-LC saves an X509 pointer in SSL connection, which allows
to use it from SSL_get_certificate() for caching purpose. In contrast,
BoringSSL reconstructs X509 on-the-fly, though given that it doesn't
support multiple certificates, always replacing previously configured
certificates, we use the last configured one from ssl->certs, instead.
In rare cases, it was possible to get into this error state on reload
with improperly updated file timestamps for certificate and key pairs.
The fix is to retry on X509_R_KEY_VALUES_MISMATCH, similar to 5d5d9adcc.
Additionally, loading SSL certificate is updated to avoid certificates
discarded on retry to appear in ssl->certs and in extra chain.
The change introduces an SNI based virtual server selection during
early ClientHello processing. The callback is available since
OpenSSL 1.1.1; for older OpenSSL versions, the previous behaviour
is kept.
Using the ClientHello callback sets a reasonable processing order
for the "server_name" TLS extension. Notably, session resumption
decision now happens after applying server configuration chosen by
SNI, useful with enabled verification of client certificates, which
brings consistency with BoringSSL behaviour. The change supersedes
and reverts a fix made in 46b9f5d38 for TLSv1.3 resumed sessions.
In addition, since the callback is invoked prior to the protocol
version negotiation, this makes it possible to set "ssl_protocols"
on a per-virtual server basis.
To keep the $ssl_server_name variable working with TLSv1.2 resumed
sessions, as previously fixed in fd97b2a80, a limited server name
callback is preserved in order to acknowledge the extension.
Note that to allow third-party modules to properly chain the call to
ngx_ssl_client_hello_callback(), the servername callback function is
passed through exdata.
This was broken in 7468a10b6 (1.29.0), resulting in a missing diagnostics
and SSL error queue not cleared for SSL handshakes rejected by SNI, seen
as "ignoring stale global SSL error" alerts, for instance, when doing SSL
shutdown of a long standing connection after rejecting another one by SNI.
The fix is to move the qc->error check after c->ssl->handshake_rejected is
handled first, to make the error queue cleared. Although not practicably
visible as needed, this is accompanied by clearing the error queue under
the qc->error case as well, to be on the safe side.
As an implementation note, due to the way of handling invalid transport
parameters for OpenSSL 3.5 and above, which leaves a passed pointer not
advanced on error, SSL_get_error() may return either SSL_ERROR_WANT_READ
or SSL_ERROR_WANT_WRITE depending on a library. To cope with that, both
qc->error and c->ssl->handshake_rejected checks were moved out of
"sslerr != SSL_ERROR_WANT_READ".
Also, this reconstructs a missing "SSL_do_handshake() failed" diagnostics
for the qc->error case, replacing using ngx_ssl_connection_error() with
ngx_connection_error(). It is made this way to avoid logging at the crit
log level because qc->error set is expected to have an empty error queue.
Reported and tested by Vladimir Homutov.
The example configuration previously specified 'charset koi8-r',
which is a legacy Cyrillic encoding. As koi8-r is rarely used today
and modern browsers handle UTF-8 by default, specifying the charset
explicitly is unnecessary. Removing the directive keeps the example
configuration concise and aligned with current best practices.
They might be reused in a session if an SMTP client proceeded
unauthenticated after previous invalid authentication attempts.
This could confuse an authentication server when passing stale
credentials along with "Auth-Method: none".
The condition to send the "Auth-Salt" header is similarly refined.
The ssl_certificate_compression directive allows to send compressed
server certificates. In OpenSSL, they are pre-compressed on startup.
To simplify configuration, the SSL_OP_NO_TX_CERTIFICATE_COMPRESSION
option is automatically cleared if certificates were pre-compressed.
SSL_CTX_compress_certs() may return an error in legitimate cases,
e.g., when none of compression algorithms is available or if the
resulting compressed size is larger than the original one, thus it
is silently ignored.
Certificate compression is supported in Chrome with brotli only,
in Safari with zlib only, and in Firefox with all listed algorithms.
It is supported since Ubuntu 24.10, which has OpenSSL with enabled
zlib and zstd support.
The actual list of algorithms supported in OpenSSL depends on how
the library was configured; it can be brotli, zlib, zstd as listed
in RFC 8879.
Certificate compression is supported since OpenSSL 3.2, it is enabled
automatically as negotiated in a TLSv1.3 handshake.
Using certificate compression and decompression in runtime may be
suboptimal in terms of CPU and memory consumption in certain typical
scenarios, hence it is disabled by default on both server and client
sides. It can be enabled with ssl_conf_command and similar directives
in upstream as appropriate, for example:
ssl_conf_command Options RxCertificateCompression;
ssl_conf_command Options TxCertificateCompression;
Compressing server certificates requires additional support, this is
addressed separately.
The function contains mostly HTTP/1.x specific request processing,
which has no use in other protocols. After the previous change in
HTTP/2, it can now be hidden.
This is an API change.
Previously, it misused the Host header processing resulting in
400 (Bad Request) errors for a valid request that contains both
":authority" and Host headers with the same value, treating it
after 37984f0be as if client sent more than one Host header.
Such an overly strict handling violates RFC 9113.
The fix is to process ":authority" as a distinct header, similarly
to processing an authority component in the HTTP/1.x request line.
This allows to disambiguate and compare Host and ":authority"
values after all headers were processed.
With this change, the ngx_http_process_request_header() function
can no longer be used here, certain parts were inlined similar to
the HTTP/3 module.
To provide compatibility for misconfigurations that use $http_host
to return the value of the ":authority" header, the Host header,
if missing, is now reconstructed from ":authority".
Previously, when using HTTP/2 over SSL, an early hints HEADERS frame was
queued in SSL buffer, and might not be immediately flushed. This resulted
in a delay of early hints delivery until the main response was sent.
The fix is to set the flush flag for the early hints HEADERS frame buffer.
RFC 9114, Section 4.3.1. specifies a restriction for :authority and Host
coexistence in an HTTP/3 request:
: If both fields are present, they MUST contain the same value.
Previously, this restriction was correctly enforced only for portless
values. When Host contained a port, the request failed as if :authority
and Host were different, regardless of :authority presence.
This happens because the value of r->headers_in.server used for :authority
has port stripped. The fix is to use r->host_start / r->host_end instead.
This might happen for Huffman encoded string literals as the result
of length expansion. Notably, the maximum length of string literals
is already limited with the "large_client_header_buffers" directive,
so this was only possible with nonsensically large configured limits.
The kevent udata field was changed from intptr_t to "void *",
similar to other BSDs and Darwin.
The NGX_KQUEUE_UDATA_T macro is adjusted to reflect that change,
fixing -Werror=int-conversion errors.
The NGX_KQUEUE_UDATA_T macro is used to compensate the incompatible
kqueue() API in NetBSD, it doesn't really belong to feature tests.
The change limits the macro visibility to the kqueue event module.
Moving from autotests also simplifies testing a particular NetBSD
version as seen in a subsequent change.
The kevent udata field is special in that we maintain compatibility
with NetBSD versions that predate using the "void *" type.
The fix is to cast to intermediate uintptr_t that is casted back to
"void *" where appropriate.
Prior to OpenSSL 3.0, OPENSSL_VERSION_NUMBER used the following format:
MNNFFPPS: major minor fix patch status
Where the status nibble (S) has 0+ for development and f for release.
The format was changed in OpenSSL 3.0.0, where it is always zero:
MNN00PP0: major minor patch
Disabling the build dependency feature is safe assuming that
nginx/Windows release zip is always built from a clean tree.
This allows to speed up total build time by around 40%.
As it may not be suitable in general, the option resides here
and not in configure.
In OpenSSL 3.5.0, the "quic_transport_parameters" extension set
internally by the QUIC API is cleared on the SSL context switch,
which disables sending QUIC transport parameters if switching to
a different server block on SNI. See the initial report in [1].
This is fixed post OpenSSL 3.5.0 [2]. The fix is anticipated in
OpenSSL 3.5.1, which has not been released yet. When building
with OpenSSL 3.5, OpenSSL compat layer is now used by default.
The OpenSSL 3.5 QUIC API support can be switched back using
--with-cc-opt='-DNGX_QUIC_OPENSSL_API=1'.
[1] https://github.com/nginx/nginx/issues/711
[2] https://github.com/openssl/openssl/commit/45bd3c3798
The gRPC module context has connection specific state, which can be lost
after request reinitialization when it comes to processing early hints.
The fix is to do only a portion of u->reinit_request() implementation
required after processing early hints, now inlined in modules.
Now NGX_HTTP_UPSTREAM_EARLY_HINTS is returned from u->process_header()
for early hints. When reading a cached response, this code is mapped
to NGX_HTTP_UPSTREAM_INVALID_HEADER to indicate invalid header format.
There were a few random places where 0 was being used as a null pointer
constant.
We have a NULL macro for this very purpose, use it.
There is also some interest in actually deprecating the use of 0 as a
null pointer constant in C.
This was found with -Wzero-as-null-pointer-constant which was enabled
for C in GCC 15 (not enabled with Wall or Wextra... yet).
Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059>
The functions ngx_http_merge_types() & ngx_conf_merge_path_value()
return either NGX_CONF_OK aka NULL aka ((void *)0) (probably) or
NGX_CONF_ERROR aka ((void *)-1).
They don't return an integer constant which is what NGX_OK aka (0) is.
Lets use the right thing in the function return check.
This was found with -Wzero-as-null-pointer-constant which was enabled
for C in GCC 15 (not enabled with Wall or Wextra... yet).
Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059>
The change implements processing upstream early hints response in
ngx_http_proxy_module and ngx_http_grpc_module. A new directive
"early_hints" enables sending early hints to the client. By default,
sending early hints is disabled.
Example:
map $http_sec_fetch_mode $early_hints {
navigate $http2$http3;
}
early_hints $early_hints;
proxy_pass http://example.com;
The support first appeared in OS X Mavericks 10.9 and documented since
OS X Yosemite 10.10.
It has a subtle implementation difference from other operating systems
in that the TCP_KEEPALIVE socket option (used in place of TCP_KEEPIDLE)
isn't inherited from a listening socket to an accepted socket.
An apparent reason for this behaviour is that it might be preserved for
the sake of backward compatibility. The TCP_KEEPALIVE socket option is
not inherited since appearance in OS X Panther 10.3, which long predates
two other TCP_KEEPINTVL and TCP_KEEPCNT socket options.
Thanks to Andy Pan for initial work.
Certain providers may attempt to reload the key on the first use after a
fork. Such attempt would require re-prompting the pin, and this time we
are not able to pass the password callback.
While it is addressable with configuration for a specific provider, it would
be prudent to ensure that no such prompts could block worker processes by
setting the default UI method.
UI_null() first appeared in 1.1.1 along with the OSSL_STORE, so it is safe
to assume the same set of guards.
A new "store:..." prefix for the "ssl_certificate_key" directive allows
loading keys via the OSSL_STORE API.
The change is required to support hardware backed keys in OpenSSL 3.x using
the new "provider(7ossl)" modules, such as "pkcs11-provider". While the
engine API is present in 3.x, some operating systems (notably, RHEL10)
have already disabled it in their builds of OpenSSL.
Related: https://trac.nginx.org/nginx/ticket/2449
Similarly to the QUIC API originated in BoringSSL, this API allows
to register custom TLS callbacks for an external QUIC implementation.
See the SSL_set_quic_tls_cbs manual page for details.
Due to a different approach used in OpenSSL 3.5, handling of CRYPTO
frames was streamlined to always write an incoming CRYPTO buffer to
the crypto context. Using SSL_provide_quic_data(), this results in
transient allocation of chain links and buffers for CRYPTO frames
received in order. Testing didn't reveal performance degradation of
QUIC handshakes, https://github.com/nginx/nginx/pull/646 provides
specific results.
Using SSL_in_init() to inspect a handshake state was replaced with
SSL_is_init_finished(). This represents a more complete fix to the
BoringSSL issue addressed in 22671b37e.
This provides awareness of the early data handshake state when using
OpenSSL 3.5 TLS callbacks in 0-RTT enabled configurations, which, in
particular, is used to avoid premature completion of the initial TLS
handshake, before required client handshake messages are received.
This is a non-functional change when using BoringSSL. It supersedes
testing non-positive SSL_do_handshake() results in all supported SSL
libraries, hence simplified.
In preparation for using OpenSSL 3.5 TLS callbacks.
Encryption level values are decoupled from ssl_encryption_level_t,
which is now limited to BoringSSL QUIC callbacks, with mappings
provided. Although the values match, this provides a technically
safe approach, in particular, to access protection level sized arrays.
In preparation for using OpenSSL 3.5 TLS callbacks.
It is now called from ngx_quic_handle_crypto_frame(), prior to proceeding
with the handshake. With this logic removed, the handshake function is
renamed to ngx_quic_handshake() to better match ngx_ssl_handshake().
All definitions now set in ngx_event_quic.h, this includes moving
NGX_QUIC_OPENSSL_COMPAT from autotests to compile time. Further,
to improve code readability, a new NGX_QUIC_QUICTLS_API macro is
used for QuicTLS that provides old BoringSSL QUIC API.
Previously, they might be logged on every add_handshake_data
callback invocation when using OpenSSL compat layer and processing
coalesced handshake messages.
Further, the ALPN error message is adjusted to signal the missing
extension. Possible reasons were previously narrowed down with
ebb6f7d65 changes in the ALPN callback that is invoked earlier in
the handshake.
Following the previous change that removed posting a close event
in OpenSSL compat layer, now ngx_quic_close_connection() is always
called on error path with either NGX_ERROR or qc->error set.
This allows to remove a special value -1 served as a missing error,
which simplifies the code. Partially reverts d3fb12d77.
Also, this improves handling of the draining connection state, which
consists of posting a close event with NGX_OK and no qc->error set,
where it was previously converted to NGX_QUIC_ERR_INTERNAL_ERROR.
Notably, this is rather a cosmetic fix, because drained connections
do not send any packets including CONNECTION_CLOSE, and qc->error
is not otherwise used.
Changed handshake callbacks to always return success. This allows to avoid
logging SSL_do_handshake() errors with empty or cryptic "internal error"
OpenSSL error messages at the inappropriate "crit" log level.
Further, connections with failed callbacks are closed now right away when
using OpenSSL compat layer. This change supersedes and reverts c37fdcdd1,
with the conditions to check callbacks invocation kept to slightly improve
code readability of control flow; they are optimized out in the resulting
assembly code.
Logging level for such errors, which should not normally happen,
is changed to NGX_LOG_ALERT, and ngx_log_error() is replaced with
ngx_ssl_error() for consistency with the rest of the code.
Previously, it was not possible to send acknowledgments if the
congestion window was limited or temporarily exceeded, such as
after sending a large response or MTU probe. If ACKs were not
received from the peer for some reason to update the in-flight
bytes counter below the congestion window, this might result in
a stalled connection.
The fix is to send ACKs regardless of congestion control. This
meets RFC 9002, Section 7:
: Similar to TCP, packets containing only ACK frames do not count
: toward bytes in flight and are not congestion controlled.
This is a simplified implementation to send ACK frames from the
head of the queue. This was made possible after 6f5f17358.
Reported in trac ticket #2621 and subsequently by Vladimir Homutov:
https://mailman.nginx.org/pipermail/nginx-devel/2025-April/ZKBAWRJVQXSZ2ISG3YJAF3EWMDRDHCMO.html
This extends the target selection implemented in dad6ec3aa6 to support
Windows ARM64 platforms. OpenSSL support for VC-WIN64-ARM target first
appeared in 1.1.1 and is present in all currently supported (3.x)
branches.
As a side effect, ARM64 Windows builds will get 16-byte alignment along
with the rest of non-x86 platforms. This is safe, as malloc on 64-bit
Windows guarantees the fundamental alignment of allocations, 16 bytes.
Previously, the default pool alignment used sizeof(unsigned long), with
the expectation that this would match to a platform word size. Certain
64-bit platforms prove this assumption wrong by keeping the 32-bit long
type, which is fully compliant with the C standard.
This introduces a possibility of suboptimal misaligned access to the
data allocated with ngx_palloc() on the affected platforms, which is
addressed here by changing the default NGX_ALIGNMENT to a pointer size.
As we override the detection in auto/os/conf for all the machine types
except x86, and Unix-like 64-bit systems prefer the 64-bit long, the
impact of the change should be limited to Win64 x64.
After fixing ngx_http_v3_encode_varlen_int() in 400eb1b628,
NGX_HTTP_V3_VARLEN_INT_LEN retained the old value of 4, which is
insufficient for the values over 1073741823 (1G - 1).
The NGX_HTTP_V3_VARLEN_INT_LEN macro is used in ngx_http_v3_uni.c to
format stream and frame types. Old buffer size is enough for formatting
this data. Also, the macro is used in ngx_http_v3_filter_module.c to
format output chunks and trailers. Considering output_buffers and
proxy_buffer_size are below 1G in all realistic scenarios, the old buffer
size is enough here as well.
RFC 9002, Section 6.1.1 defines packet reordering threshold as 3. Testing
shows that such low value leads to spurious packet losses followed by
congestion window collapse. The change implements dynamic packet threshold
detection based on in-flight packet range. Packet threshold is defined
as half the number of in-flight packets, with mininum value of 3.
Also, renamed ngx_quic_lost_threshold() to ngx_quic_time_threshold()
for better compliance with RFC 9002 terms.
Previosly the threshold was hardcoded at 10000. This value is too low for
high BDP networks. For example, if all frames are STREAM frames, and MTU
is 1500, the upper limit for congestion window would be roughly 15M
(10000 * 1500). With 100ms RTT it's just a 1.2Gbps network (15M * 10 * 8).
In reality, the limit is even lower because of other frame types. Also,
the number of frames that could be used simultaneously depends on the total
amount of data buffered in all server streams, and client flow control.
The change sets frame threshold based on max concurrent streams and stream
buffer size, the product of which is the maximum number of in-flight stream
data in all server streams at any moment. The value is divided by 2000 to
account for a typical MTU 1500 and the fact that not all frames are STREAM
frames.
If connection is network-limited, MTU probes have little chance of being
sent since congestion window is almost always full. As a result, PMTUD
may not be able to reach the real MTU and the connection may operate with
a reduced MTU. The solution is to ignore the congestion window. This may
lead to a temporary increase in in-flight count beyond congestion window.
As per RFC 9000, Section 14.4:
Loss of a QUIC packet that is carried in a PMTU probe is therefore
not a reliable indication of congestion and SHOULD NOT trigger a
congestion control reaction.
Previously, these functions operated on a per-level basis. This however
resulted in excessive logging of in_flight and will also led to extra
work detecting underutilized congestion window in the followup patches.
On some systems the value of ngx_current_msec is derived from monotonic
clock, for which the following is defined by POSIX:
For this clock, the value returned by clock_gettime() represents
the amount of time (in seconds and nanoseconds) since an unspecified
point in the past.
As as result, overflow protection is needed when comparing two ngx_msec_t.
The change adds such protection to the ngx_quic_detect_lost() function.
Since recovery_start field was initialized with ngx_current_msec, all
congestion events that happened within the same millisecond or cycle
iteration, were treated as in recovery mode.
Also, when handling persistent congestion, initializing recovery_start
with ngx_current_msec resulted in treating all sent packets as in recovery
mode, which violates RFC 9002, see example in Appendix B.8.
While here, also fixed recovery_start wrap protection. Previously it used
2 * max_idle_timeout time frame for all sent frames, which is not a
reliable protection since max_idle_timeout is unrelated to congestion
control. Now recovery_start <= now condition is enforced. Note that
recovery_start wrap is highly unlikely and can only occur on a
32-bit system if there are no congestion events for 24 days.
Previously, the expiration caused QUIC connection finalization even if
there are application-terminated streams finishing sending data. Such
finalization terminated these streams.
An easy way to trigger this is to request a large file from HTTP/3 over
a small MTU. In this case keepalive timeout expiration may abruptly
terminate the request stream.
Improved logging for simpler data extraction for plotting congestion
window graphs. In particular, added current milliseconds number from
ngx_current_msec.
While here, simplified logging text and removed irrelevant data.
Starting with OpenSSL 3.0, groups may be added externally with pluggable
KEM providers. Using SSL_get_negotiated_group(), which makes lookup in a
static table with known groups, doesn't allow to list such groups by names
leaving them in hex. Adding X25519MLKEM768 to the default group list in
OpenSSL 3.5 made this problem more visible. SSL_get0_group_name() and,
apparently, SSL_group_to_name() allow to resolve such provider-implemented
groups, which is also "generally preferred" over SSL_get_negotiated_group()
as documented in OpenSSL git commit 93d4f6133f.
This change makes external groups listing by name using SSL_group_to_name()
available since OpenSSL 3.0. To preserve "prime256v1" naming for the group
0x0017, and to avoid breaking BoringSSL and older OpenSSL versions support,
it is used supplementary for a group that appears to be unknown.
See https://github.com/openssl/openssl/issues/27137 for related discussion.
Passwords were not preserved in optimized SSL contexts, the bug had
appeared in d791b4aab (1.23.1), as in the following configuration:
server {
proxy_ssl_password_file password;
proxy_ssl_certificate $ssl_server_name.crt;
proxy_ssl_certificate_key $ssl_server_name.key;
location /original/ {
proxy_pass https://u1/;
}
location /optimized/ {
proxy_pass https://u2/;
}
}
The fix is to always preserve passwords, by copying to the configuration
pool, if dynamic certificates are used. This is done as part of merging
"ssl_passwords" configuration.
To minimize the number of copies, a preserved version is then used for
inheritance. A notable exception is inheritance of preserved empty
passwords to the context with statically configured certificates:
server {
proxy_ssl_certificate $ssl_server_name.crt;
proxy_ssl_certificate_key $ssl_server_name.key;
location / {
proxy_pass ...;
proxy_ssl_certificate example.com.crt;
proxy_ssl_certificate_key example.com.key;
}
}
In this case, an unmodified version (NULL) of empty passwords is set,
to allow reading them from the password prompt on nginx startup.
As an additional optimization, a preserved instance of inherited
configured passwords is set to the previous level, to inherit it
to other contexts:
server {
proxy_ssl_password_file password;
location /1/ {
proxy_pass https://u1/;
proxy_ssl_certificate $ssl_server_name.crt;
proxy_ssl_certificate_key $ssl_server_name.key;
}
location /2/ {
proxy_pass https://u2/;
proxy_ssl_certificate $ssl_server_name.crt;
proxy_ssl_certificate_key $ssl_server_name.key;
}
}
It was possible to write outside of the buffer used to keep UTF-8
decoded values when parsing conversion table configuration.
Since this happened before UTF-8 decoding, the fix is to check in
advance if character codes are of more than 3-byte sequence. Note
that this is already enforced by a later check for ngx_utf8_decode()
decoded values for 0xffff, which corresponds to the maximum value
encoded as a valid 3-byte sequence, so the fix does not affect the
valid values.
Found with AddressSanitizer.
Fixes GitHub issue #529.
As uncovered by recent addition in slice.t, a partially initialized
context, coupled with HTTP 206 response from stub backend, might be
accessed in the next slice subrequest.
Found by bad memory allocator simulation.
Upstream SSL sessions may be of a noticeably larger size with tickets
in TLSv1.2 and older versions, or with "stateless" tickets in TLSv1.3,
if a client certificate is saved into the session. Further, certain
stateless session resumption implemetations may store additional data.
Such one is JDK, known to also include server certificates in session
ticket data, which roughly doubles a decoded session size to slightly
beyond the previous limit. While it's believed to be an issue on the
JDK side, this change allows to save such sessions.
Another, innocent case is using RSA certificates with 8192 key size.
Previously, request might be left in inconsistent state in case of error,
which manifested in "http request count is zero" alerts when used by SSI
filter.
The fix is to reshuffle initialization order to postpone committing state
changes until after any potentially failing parts.
Found by bad memory allocator simulation.
In OpenSSL, session resumption always happens in the default SSL context,
prior to invoking the SNI callback. Further, unlike in TLSv1.2 and older
protocols, SSL_get_servername() returns values received in the resumption
handshake, which may be different from the value in the initial handshake.
Notably, this makes the restriction added in b720f650b insufficient for
sessions resumed with different SNI server name.
Considering the example from b720f650b, previously, a client was able to
request example.org by presenting a certificate for example.org, then to
resume and request example.com.
The fix is to reject handshakes resumed with a different server name, if
verification of client certificates is enabled in a corresponding server
configuration.
The directive sets a timeout during which a keepalive connection will
not be closed by nginx for connection reuse or graceful shutdown.
The change allows clients that send multiple requests over the same
connection without delay or with a small delay between them, to avoid
receiving a TCP RST in response to one of them. This excludes network
issues and non-graceful shutdown. As a side-effect, it also addresses
the TCP reset problem described in RFC 9112, Section 9.6, when the last
sent HTTP response could be damaged by a followup TCP RST. It is important
for non-idempotent requests, which cannot be retried by client.
It is not recommended to set keepalive_min_timeout to large values as
this can introduce an additional delay during graceful shutdown and may
restrict nginx from effective connection reuse.
The build location of the resulting libatomic_ops.a was changed in v7.4.0
after converting libatomic_ops to use libtool. The fix is to use library
from the install path, this allows building with both old and new versions.
Initially reported here:
https://mailman.nginx.org/pipermail/nginx/2018-April/056054.html
This can happen with certificates and certificate keys specified
with variables due to partial cache update in various scenarios:
- cache expiration with only one element of pair evicted
- on-disk update with non-cacheable encrypted keys
- non-atomic on-disk update
The fix is to retry with fresh data on X509_R_KEY_VALUES_MISMATCH.
A new directive "ssl_certificate_cache max=N [valid=time] [inactive=time]"
enables caching of SSL certificate chain and secret key objects specified
by "ssl_certificate" and "ssl_certificate_key" directives with variables.
Co-authored-by: Aleksei Bavshin <a.bavshin@nginx.com>
SSL object cache, as previously introduced in 1.27.2, did not take
into account encrypted certificate keys that might be unexpectedly
fetched from the cache regardless of the matching passphrase. To
avoid this, caching of encrypted certificate keys is now disabled
based on the passphrase callback invocation.
A notable exception is encrypted certificate keys configured without
ssl_password_file. They are loaded once resulting in the passphrase
prompt on startup and reused in other contexts as applicable.
Memory based objects are always inherited, engine based objects are
never inherited to adhere the volatile nature of engines, file based
objects are inherited subject to modification time and file index.
The previous behaviour to bypass cache from the old configuration cycle
is preserved with a new directive "ssl_object_cache_inheritable off;".
It now uses 5/4 times more memory for the pending buffer.
Further, a single allocation is now used, which takes additional 56 bytes
for deflate_allocs in 64-bit mode aligned to 16, to store sub-allocation
pointers, and the total allocation size now padded up to 128 bytes, which
takes theoretically 200 additional bytes in total. This fits though into
"4 * (64 + sizeof(void*))" additional space for ZALLOC used in zlib-ng
2.1.x versions. The comment was updated to reflect this.
While trying to close a stream in ngx_quic_close_streams() by calling its
read event handler, the next stream saved prior to that could be destroyed
recursively. This caused a segfault while trying to access the next stream.
The way the next stream could be destroyed in HTTP/3 is the following.
A request stream read event handler ngx_http_request_handler() could
end up calling ngx_http_v3_send_cancel_stream() to report a cancelled
request stream in the decoder stream. If sending stream cancellation
decoder instruction fails for any reason, and the decoder stream is the
next in order after the request stream, the issue is triggered.
The fix is to postpone calling read event handlers for all streams being
closed to avoid closing a released stream.
Previously, such packets were treated as long header packets with unknown
version 0, and a version negotiation packet was sent in response. This
could be used to set up an infinite traffic reflect loop with another nginx
instance.
Now version negotiation packets are ignored. As per RFC 9000, Section 6.1:
An endpoint MUST NOT send a Version Negotiation packet in response to
receiving a Version Negotiation packet.
Since 0-RTT and 1-RTT data exist in the same packet number space,
ngx_quic_discard_ctx incorrectly discards 1-RTT packets when
0-RTT keys are discarded.
The issue was introduced by 58b92177e7.
In particular, an untagged CAPABILITY response as described in the
interim RFC 3501 internet drafts was seen in various IMAP servers.
Previously resulted in a broken connection, now an untagged response
is proxied to client.
When client address is received, IPv6 address could be specified without
square brackets and without port, as well as both with the brackets and
port. The change allows IPv6 in square brackets and no port, which was
previously considered an error. This format conforms to RFC 3986.
The change also affects proxy_bind and friends.
Renaming a temporary file to an empty path ("") returns NGX_ENOPATH
with a subsequent ngx_create_full_path() to create the full path.
This function skips initial bytes as part of path separator lookup,
which causes out of bounds access on short strings.
The fix is to avoid renaming a temporary file to an obviously invalid
path, as well as explicitly forbid such syntax for literal values.
Although Coverity reports about potential type underflow, it is not
actually possible because the terminating '\0' is always included.
Notably, the run-time check is sufficient enough for Win32 as well.
Other short invalid values result either in NGX_ENOENT or NGX_EEXIST
and "MoveFile() .. failed" critical log messages, which involves a
separate error handling.
Prodded by Coverity (CID 1605485).
This simplifies merging protocol values after ea15896 and ebd18ec.
Further, as outlined in ebd18ec18, for libraries preceeding TLSv1.2+
support, only meaningful versions TLSv1 and TLSv1.1 are set by default.
While here, fixed indentation.
When cropping stsc atom, it's assumed that chunk index is never 0.
Based on this assumption, start_chunk and end_chunk are calculated
by subtracting 1 from it. If chunk index is zero, start_chunk or
end_chunk may underflow, which will later trigger
"start/end time is out mp4 stco chunks" error. The change adds an
explicit check for zero chunk index to avoid underflow and report
a proper error.
Zero chunk index is explicitly banned in ISO/IEC 14496-12, 8.7.4
Sample To Chunk Box. It's also implicitly banned in QuickTime File
Format specification. Description of chunk offset table references
"Chunk 1" as the first table element.
Currently an error is triggered if any of the chunk runs in stsc are
unordered. This however does not include the final chunk run, which
ends with trak->chunks + 1. The previous chunk index can be larger
leading to a 32-bit overflow. This could allow to skip the validity
check "if (start_sample > n)". This could later lead to a large
trak->start_chunk/trak->end_chunk, which would be caught later in
ngx_http_mp4_update_stco_atom() or ngx_http_mp4_update_co64_atom().
While there are no implications of the validity check being avoided,
the change still adds a check to ensure the final chunk run is ordered,
to produce a meaningful error and avoid a potential integer overflow.
A specially crafted mp4 file with an empty run of chunks in the stsc atom
and a large value for samples per chunk for that run, combined with a
specially crafted request, allowed to store that large value in prev_samples
and later in trak->end_chunk_samples while in ngx_http_mp4_crop_stsc_data().
Later in ngx_http_mp4_update_stsz_atom() this could result in buffer
overread while calculating trak->end_chunk_samples_size.
Now the value of samples per chunk specified for an empty run is ignored.
Absolute paths in links end up being rooted at github.com.
The contributing guidelines link is broken unless we use the full URL.
Also, remove superfluous "monospace formatting" for the link.
Previously, all upstream DNS entries would be immediately re-resolved
on config reload. With a large number of upstreams, this creates
a spike of DNS resolution requests. These spikes can overwhelm the
DNS server or cause drops on the network.
This patch retains the TTL of previous resolutions across reloads
by copying each upstream's name's expiry time across configuration
cycles. As a result, no additional resolutions are needed.
After configuration is reloaded, it may take some time for the
re-resolvable upstream servers to resolve and become available
as peers. During this time, client requests might get dropped.
Such servers are now pre-resolved using the "cache" of already
resolved peers from the old shared memory zone.
Specifying the upstream server by a hostname together with the
"resolve" parameter will make the hostname to be periodically
resolved, and upstream servers added/removed as necessary.
This requires a "resolver" at the "http" configuration block.
The "resolver_timeout" parameter also affects when the failed
DNS requests will be attempted again. Responses with NXDOMAIN
will be attempted again in 10 seconds.
Upstream has a configuration generation number that is incremented each
time servers are added/removed to the primary/backup list. This number
is remembered by the peer.init method, and if peer.get detects a change
in configuration, it returns NGX_BUSY.
Each server has a reference counter. It is incremented by peer.get and
decremented by peer.free. When a server is removed, it is removed from
the list of servers and is marked as "zombie". The memory allocated by
a zombie peer is freed only when its reference count becomes zero.
Co-authored-by: Roman Arutyunyan <arut@nginx.com>
Co-authored-by: Sergey Kandaurov <pluknet@nginx.com>
Co-authored-by: Vladimir Homutov <vl@nginx.com>
TLSv1 and TLSv1.1 are formally deprecated and forbidden to negotiate due
to insufficient security reasons outlined in RFC 8996.
TLSv1 and TLSv1.1 are disabled in BoringSSL e95b0cad9 and LibreSSL 3.8.1
in the way they cannot be enabled in nginx configuration. In OpenSSL 3.0,
they are only permitted at security level 0 (disabled by default).
The support is dropped in Chrome 84, Firefox 78, and deprecated in Safari.
This change disables TLSv1 and TLSv1.1 by default for OpenSSL 1.0.1 and
newer, where TLSv1.2 support is available. For older library versions,
which do not have alternatives, these protocol versions remain enabled.
Since a2a513b93c, stream frames no longer need to be retransmitted after it
was deleted. The frames which were retransmitted before, could be stream data
frames sent prior to a RESET_STREAM. Such retransmissions are explicitly
prohibited by RFC 9000, Section 19.4.
EVP_KEY objects are a reference-counted container for key material, shallow
copies and OpenSSL stack management aren't needed as with certificates.
Based on previous work by Mini Hawthorne.
Certificate chains are now loaded once.
The certificate cache provides each chain as a unique stack of reference
counted elements. This shallow copy is required because OpenSSL stacks
aren't reference counted.
Based on previous work by Mini Hawthorne.
Added ngx_openssl_cache_module, which indexes a type-aware object cache.
It maps an id to a unique instance, and provides references to it, which
are dropped when the cycle's pool is destroyed.
The cache will be used in subsequent patches.
Based on previous work by Mini Hawthorne.
Instead of cross-linking the objects using exdata, pointers to configured
certificates are now stored in ngx_ssl_t, and OCSP staples are now accessed
with rbtree in it. This allows sharing these objects between SSL contexts.
Based on previous work by Mini Hawthorne.
Starting from TLSv1.1 (as seen since draft-ietf-tls-rfc2246-bis-00),
the "certificate_authorities" field grammar of the CertificateRequest
message was redone to allow no distinguished names. In TLSv1.3, with
the restructured CertificateRequest message, this can be similarly
done by optionally including the "certificate_authorities" extension.
This allows to avoid sending DNs at all.
In practice, aside from published TLS specifications, all supported
SSL/TLS libraries allow to request client certificates with an empty
DN list for any protocol version. For instance, when operating in
TLSv1, this results in sending the "certificate_authorities" list as
a zero-length vector, which corresponds to the TLSv1.1 specification.
Such behaviour goes back to SSLeay.
The change relaxes the requirement to specify at least one trusted CA
certificate in the ssl_client_certificate directive, which resulted in
sending DNs of these certificates (closes#142). Instead, all trusted
CA certificates can be specified now using the ssl_trusted_certificate
directive if needed. A notable difference that certificates specified
in ssl_trusted_certificate are always loaded remains (see 3648ba7db).
Co-authored-by: Praveen Chaudhary <praveenc@nvidia.com>
Pushes to master and stable branches will result in buildbot-like checks
on multiple OSes and architectures.
Pull requests will be checked on a public Ubuntu GitHub runner.
Unordered chunks could result in trak->end_chunk smaller than trak->start_chunk
in ngx_http_mp4_crop_stsc_data(). Later in ngx_http_mp4_update_stco_atom()
this caused buffer overread while trying to calculate trak->end_offset.
While cropping an stsc atom in ngx_http_mp4_crop_stsc_data(), a 32-bit integer
overflow could happen, which could result in incorrect seeking and a very large
value stored in "samples". This resulted in a large invalid value of
trak->end_chunk_samples. This value is further used to calculate the value of
trak->end_chunk_samples_size in ngx_http_mp4_update_stsz_atom(). While doing
this, a large invalid value of trak->end_chunk_samples could result in reading
memory before stsz atom start. This could potentially result in a segfault.
In some rare cases, graceful shutdown may happen while initializing an HTTP/2
connection. Previously, such a connection ignored the shutdown and remained
active. Now it is gracefully closed prior to processing any streams to
eliminate the shutdown delay.
Previously handlers were mandatory. However they are not always needed.
For example, a server configured with ssl_reject_handshake does not need a
handler. Such servers required a fake handler to pass the check. Now handler
absence check is moved to runtime. If handler is missing, the connection is
closed with 500 code.
Previously the last chain field of ngx_quic_buffer_t could still reference freed
chains and buffers after calling ngx_quic_free_buffer(). While normally an
ngx_quic_buffer_t object should not be used after freeing, resetting last_chain
field would prevent a potential use-after-free.
Sending handshake-level CRYPTO frames after the client's Finished message could
lead to memory disclosure and a potential segfault, if those frames are sent in
one packet with the Finished frame.
While inserting a new entry into the dynamic table, first the entry is added,
and then older entries are evicted until table size is within capacity. After
the first step, the number of entries may temporarily exceed the maximum
calculated from capacity by one entry, which previously caused table overflow.
The easiest way to trigger the issue is to keep adding entries with empty names
and values until first eviction.
The issue was introduced by 987bee4363d1.
Previously a decoder stream was created on demand for sending Section
Acknowledgement, Stream Cancellation and Insert Count Increment. If conditions
for sending any of these instructions never happen, a decoder stream is not
created at all. These conditions include client not using the dynamic table and
no streams abandoned by server (RFC 9204, Section 2.2.2.2). However RFC 9204,
Section 4.2 defines only one condition for not creating a decoder stream:
An endpoint MAY avoid creating a decoder stream if its decoder sets
the maximum capacity of the dynamic table to zero.
The change enables pre-creation of the decoder stream at HTTP/3 session
initialization if maximum dynamic table capacity is not zero. Note that this
value is currently hardcoded to 4096 bytes and is not configurable, so the
stream is now always created.
Also, the change fixes a potential stack overflow when creating a decoder
stream in ngx_http_v3_send_cancel_stream() while draining a request stream by
ngx_drain_connections(). Creating a decoder stream involves calling
ngx_get_connection(), which calls ngx_drain_connections(), which will drain the
same request stream again. If client's MAX_STREAMS for uni stream is high
enough, these recursive calls will continue until we run out of stack.
Otherwise, decoder stream creation will fail at some point and the request
stream connection will be drained. This may result in use-after-free, since
this connection could still be referenced up the stack.
Previously chain links could sometimes be dropped instead of being reused,
which could result in increased memory consumption during long requests.
A similar chain link issue in ngx_http_gzip_filter_module was fixed in
da46bfc484ef (1.11.10).
Based on a patch by Sangmin Lee.
Using "long *" instead of "AO_t *" leads either to -Wincompatible-pointer-types
or -Wpointer-sign warnings, depending on whether long and size_t are compatible
types (e.g., ILP32 versus LP64 data models). Notably, -Wpointer-sign warnings
are enabled by default in Clang only, and -Wincompatible-pointer-types is an
error starting from GCC 14.
Signed-off-by: Edgar Bonet <bonet@grenoble.cnrs.fr>
Passing from udp was not possible for the most part due to preread buffer
restriction. Passing to udp could occasionally work, but the connection would
still be bound to the original listen rbtree, which prevented it from being
deleted on connection closure.
When loading certificate keys via ENGINE_load_private_key() in runtime,
it was possible to overwrite configuration on ENGINE_by_id() failure.
OpenSSL documention doesn't describe errors in details, the only reason
I found in the comment to example is when the engine is not available.
Previously, a request body larger than declared in Content-Length resulted in
a 413 status code, because Content-Length was mistakenly used as the maximum
allowed request body, similar to client_max_body_size. Following the HTTP/3
specification, such requests are now rejected with the 400 error as malformed.
The ngx_quic_run() function uses qc->close timer to limit the handshake
duration. Normally it is removed by ngx_quic_do_init_streams() which is
called once when we are done with initial SSL processing.
The problem happens when the client sends early data and streams are
initialized in the ngx_quic_run() -> ngx_quic_handle_datagram() call.
The order of set/remove timer calls is now reversed; the close timer is
set up and the timer fires when assigned, starting the unexpected connection
close process.
The fix is to skip setting the timer if streams were initialized during
handling of the initial datagram. The idle timer for quic is set anyway,
and stream-related timeouts are managed by application layer.
Notably, Apple Silicon CPUs have 128 byte cache line size,
which is twice the default configured for generic aarch64.
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Previously, the response text wasn't initialized and the rewrite module
was sending response body set to NULL.
Found with UndefinedBehaviorSanitizer (pointer-overflow).
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
Previously, it could result when left-shifting signed integer due to implicit
integer promotion, such that the most significant bit appeared on the sign bit.
In practice, though, this results in the same left value as with an explicit
cast, at least on known compilers, such as GCC and Clang. The reason is that
in_addr_t, which is equivalent to uint32_t and same as "unsigned int" in ILP32
and LP64 data type models, has the same type width as the intermediate after
integer promotion, so there's no side effects such as sign-extension. This
explains why adding an explicit cast does not change object files in practice.
Found with UndefinedBehaviorSanitizer (shift).
Based on a patch by Piotr Sikora.
While copying ngx_http_variable_value_t structures to geo binary base
in ngx_http_geo_copy_values(), and similarly in the stream module,
uninitialized parts of these structures are copied as well. These
include the "escape" field and possible holes. Calculating crc32 of
this data triggers uninitialized memory access.
Found with MemorySanitizer.
Signed-off-by: Piotr Sikora <piotr@aviatrix.com>
In preparation for adding more parameters to the listen directive,
and to be in sync with the corresponding structure in the http module.
No functional changes.
Originally, the stream module was developed based on the mail module,
following the existing style. Then it was diverged to closely follow
the http module development. This change updates style to use sscf
naming convention troughout the stream module, which matches the http
module code style. No functional changes.
The module allows to pass connections from Stream to other modules such as HTTP
or Mail, as well as back to Stream. Previously, this was only possible with
proxying. Connections with preread buffer read out from socket cannot be
passed.
The module allows selective SSL termination based on SNI.
stream {
server {
listen 8000 default_server;
ssl_preread on;
...
}
server {
listen 8000;
server_name foo.example.com;
pass 127.0.0.1:8001; # to HTTP
}
server {
listen 8000;
server_name bar.example.com;
...
}
}
http {
server {
listen 8001 ssl;
...
location / {
root html;
}
}
}
Server name is taken either from ngx_stream_ssl_module or
ngx_stream_ssl_preread_module.
The change adds "default_server" parameter to the "listen" directive,
as well as the following directives: "server_names_hash_max_size",
"server_names_hash_bucket_size", "server_name" and "ssl_reject_handshake".
Previously, preread buffer was always read out from socket, which made it
impossible to terminate SSL on the connection without introducing additional
SSL BIOs. The following patches will rely on this.
Now, when possible, recv(MSG_PEEK) is used instead, which keeps data in socket.
It's called if SSL is not already terminated and if an egde-triggered event
method is used. For epoll, EPOLLRDHUP support is also required.
Stream connection cleanup handler ngx_quic_stream_cleanup_handler() calls
ngx_quic_shutdown_stream() after which it resets the pointer from quic stream
to the connection (sc->connection = NULL). Previously if this call failed,
sc->connection retained the old value, while the connection was freed by the
application code. This resulted later in a second attempt to close the freed
connection, which lead to allocator double free error.
The fix is to reset the sc->connection pointer in case of error.
Inspired by RFC 9001, Section 6.3, trial packet decryption with the current
keys is now used to avoid a timing side-channel signal. Further, this fixes
segfault while accessing missing next keys (ticket #2585).
Previously if an MTU probe send failed early in ngx_quic_frame_sendto()
due to allocation error or congestion control, the application level packet
number was not increased, but was still saved as MTU probe packet number.
Later when a packet with this number was acknowledged, the unsent MTU probe
was acknowledged as well. This could result in discovering a bigger MTU than
supported by the path, which could lead to EMSGSIZE (Message too long) errors
while sending further packets.
The problem existed since PMTUD was introduced in 58afcd72446f (1.25.2).
Back then only the unlikely memory allocation error could trigger it. However
in efcdaa66df2e congestion control was added to ngx_quic_frame_sendto() which
can now trigger the issue with a higher probability.
Now "fastopen", "backlog", "accept_filter", "deferred", and "so_keepalive"
parameters are not allowed with "quic" in the "listen" directive.
Reported by Izorkin.
When filter finalization is triggered when working with an upstream server,
and error_page redirects request processing to some simple handler,
ngx_http_request_finalize() triggers request termination when the response
is sent. In particular, via the upstream cleanup handler, nginx will close
the upstream connection and the corresponding socket.
Still, this can happen to be with ngx_event_pipe() on stack. While
the code will set p->downstream_error due to NGX_ERROR returned from the
output filter chain by filter finalization, otherwise the error will be
ignored till control returns to ngx_http_upstream_process_request().
And event pipe might try reading from the (already closed) socket, resulting
in "readv() failed (9: Bad file descriptor) while reading upstream" errors
(or even segfaults with SSL).
Such errors were seen with the following configuration:
location /t2 {
proxy_pass http://127.0.0.1:8080/big;
image_filter_buffer 10m;
image_filter resize 150 100;
error_page 415 = /empty;
}
location /empty {
return 204;
}
location /big {
# big enough static file
}
Fix is to clear p->upstream in ngx_http_upstream_finalize_request(),
and ensure that p->upstream is checked in ngx_event_pipe_read_upstream()
and when handling events at ngx_event_pipe() exit.
When a request was terminated due to an error via ngx_http_terminate_request()
while an AIO operation was running in a subrequest, various issues were
observed. This happened because ngx_http_request_finalizer() was only set
in the subrequest where ngx_http_terminate_request() was called, but not
in the subrequest where the AIO operation was running. After completion
of the AIO operation normal processing of the subrequest was resumed, leading
to issues.
In particular, in case of the upstream module, termination of the request
called upstream cleanup, which closed the upstream connection. Attempts to
further work with the upstream connection after AIO operation completion
resulted in segfaults in ngx_ssl_recv(), "readv() failed (9: Bad file
descriptor) while reading upstream" errors, or socket leaks.
In ticket #2555, issues were observed with the following configuration
with cache background update (with thread writing instrumented to
introduce a delay, when a client closes the connection during an update):
location = /background-and-aio-write {
proxy_pass ...
proxy_cache one;
proxy_cache_valid 200 1s;
proxy_cache_background_update on;
proxy_cache_use_stale updating;
aio threads;
aio_write on;
limit_rate 1000;
}
Similarly, the same issue can be seen with SSI, and can be caused by
errors in subrequests, such as in the following configuration
(where "/proxy" uses AIO, and "/sleep" returns 444 after some delay,
causing request termination):
location = /ssi-active-boom {
ssi on;
ssi_types *;
return 200 '
<!--#include virtual="/proxy" -->
<!--#include virtual="/sleep" -->
';
limit_rate 1000;
}
Or the same with both AIO operation and the error in non-active subrequests
(which needs slightly different handling, see below):
location = /ssi-non-active-boom {
ssi on;
ssi_types *;
return 200 '
<!--#include virtual="/static" -->
<!--#include virtual="/proxy" -->
<!--#include virtual="/sleep" -->
';
limit_rate 1000;
}
Similarly, issues can be observed with just static files. However,
with static files potential impact is limited due to timeout safeguards
in ngx_http_writer(), and the fact that c->error is set during request
termination.
In a simple configuration with an AIO operation in the active subrequest,
such as in the following configuration, the connection is closed right
after completion of the AIO operation anyway, since ngx_http_writer()
tries to write to the connection and fails due to c->error set:
location = /ssi-active-static-boom {
ssi on;
ssi_types *;
return 200 '
<!--#include virtual="/static-aio" -->
<!--#include virtual="/sleep" -->
';
limit_rate 1000;
}
In the following configuration, with an AIO operation in a non-active
subrequest, the connection is closed only after send_timeout expires:
location = /ssi-non-active-static-boom {
ssi on;
ssi_types *;
return 200 '
<!--#include virtual="/static" -->
<!--#include virtual="/static-aio" -->
<!--#include virtual="/sleep" -->
';
limit_rate 1000;
}
Fix is to introduce r->main->terminated flag, which is to be checked
by AIO event handlers when the r->main->blocked counter is decremented.
When the flag is set, handlers are expected to wake up the connection
instead of the subrequest (which might be already cleaned up).
Additionally, now ngx_http_request_finalizer() is always set in the
active subrequest, so waking up the connection properly finalizes the
request even if termination happened in a non-active subrequest.
Each AIO (thread IO) operation being run is now accompanied with 1-minute
timer. This timer prevents unexpected shutdown of the worker process while
an AIO operation is running, and logs an alert if the operation is running
for too long.
This fixes "open socket left" alerts during worker processes shutdown
due to pending AIO (or thread IO) operations while corresponding requests
have no timers. In particular, such errors were observed while reading
cache headers (ticket #2162), and with worker_shutdown_timeout.
When graceful shutdown was requested, and then nginx was forced to
do fast shutdown, it used to (incorrectly) complain about open sockets
left in connections which weren't yet closed when fast shutdown
was requested.
Fix is to avoid complaining about open sockets when fast shutdown was
requested after graceful one. Abnormal termination, if requested with
the WINCH signal, can still happen though.
OPENSSL_VERSION_NUMBER is now redefined to 0x1010000fL for LibreSSL 3.5.0
and above. Building with older LibreSSL versions, such as 2.8.0, may now
produce warnings (see cab37803ebb3) and may require appropriate compiler
options to suppress them.
Notably, this allows to start using SSL_get0_verified_chain() appeared
in OpenSSL 1.1.0 and LibreSSL 3.5.0, without additional macro tests.
Prodded by Ilya Shipitsin.
Similar to 7356:e3ba4026c02d, as long as SSL_OP_NO_CLIENT_RENEGOTIATION
is defined, it is the library responsibility to prevent renegotiation.
Additionally, this allows to raise LibreSSL version used to redefine
OPENSSL_VERSION_NUMBER to 0x1010000fL, such that this won't result in
attempts to dereference SSL objects made opaque in LibreSSL 3.4.0.
Patch by Maxim Dounin.
rand() is used on win32. RAND_MAX is implementation defined. win32's is
0x7fff.
Existing uses of ngx_random() rely upon 0x7fffffff range provided by
POSIX implementations of random().
On-packet acknowledgement is made path aware, as per RFC 9000, Section 9.4:
Packets sent on the old path MUST NOT contribute to congestion control
or RTT estimation for the new path.
To make this possible in a single congestion control context, the first packet
to be sent after the new path has been validated, which includes resetting the
congestion controller and RTT estimator, is now remembered in the connection.
Packets sent previously, such as on the old path, are not taken into account.
Note that although the packet number is saved per-connection, the added checks
affect application level packets only. For non-application level packets,
which are only processed prior to the handshake is complete, the remembered
packet number remains set to zero.
As per RFC 9000, Section 8.2.1:
When an endpoint is unable to expand the datagram size to 1200 bytes due
to the anti-amplification limit, the path MTU will not be validated.
To ensure that the path MTU is large enough, the endpoint MUST perform a
second path validation by sending a PATH_CHALLENGE frame in a datagram of
at least 1200 bytes.
Previously ngx_quic_frame_sendto() ignored congestion control and did not
contribute to in_flight counter.
Now congestion control window is checked unless ignore_congestion flag is set.
Also, in_flight counter is incremented and the frame is stored in ctx->sent
queue if it's ack-eliciting. This behavior is now similar to
ngx_quic_output_packet().
According to RFC 9000, an endpoint SHOULD NOT send multiple PATH_CHALLENGE
frames in a single packet. The change adds a check to enforce this claim to
optimize server behavior. Previously each PATH_CHALLENGE always resulted in a
single response datagram being sent to client. The effect of this was however
limited by QUIC flood protection.
Also, PATH_CHALLENGE is explicitly disabled in Initial and Handshake levels,
see RFC 9000, Table 3. However, technically it may be sent by client in 0-RTT
over a new path without actual migration, even though the migration itself is
prohibited during handshake. This allows client to coalesce multiple 0-RTT
packets each carrying a PATH_CHALLENGE and end up with multiple PATH_CHALLENGEs
per datagram. This again leads to suboptimal behavior, see above. Since the
purpose of sending PATH_CHALLENGE frames in 0-RTT is unclear, these frames are
now only allowed in 1-RTT. For 0-RTT they are silently ignored.
Previously, when using ngx_quic_frame_sendto() to explicitly send a packet with
a single frame, anti-amplification limit was not properly enforced. Even when
there was no quota left for the packet, it was sent anyway, but with no padding.
Now the packet is not sent at all.
This function is called to send PATH_CHALLENGE/PATH_RESPONSE, PMTUD and probe
packets. For all these cases packet send is retried later in case the send was
not successful.
By default packets with these frames are expanded to 1200 bytes. Previously,
if anti-amplification limit did not allow this expansion, it was limited to
whatever size was allowed. However RFC 9000 clearly states no partial
expansion should happen in both cases.
Section 8.2.1. Initiating Path Validation:
An endpoint MUST expand datagrams that contain a PATH_CHALLENGE frame
to at least the smallest allowed maximum datagram size of 1200 bytes,
unless the anti-amplification limit for the path does not permit
sending a datagram of this size.
Section 8.2.2. Path Validation Responses:
An endpoint MUST expand datagrams that contain a PATH_RESPONSE frame
to at least the smallest allowed maximum datagram size of 1200 bytes.
...
However, an endpoint MUST NOT expand the datagram containing the
PATH_RESPONSE if the resulting data exceeds the anti-amplification limit.
If URI is not fully parsed yet, some pointers are not set. As a result,
the calculation of "new + (ptr - old)" expression is flawed.
According to C11, 6.5.6 Additive operators, p.9:
: When two pointers are subtracted, both shall point to elements
: of the same array object, or one past the last element of the
: array object
Since "ptr" is not set, subtraction leads to undefined behaviour, because
"ptr" and "old" are not in the same buffer (i.e. array objects).
Prodded by GCC undefined behaviour sanitizer.
Neither r->port_start nor r->port_end were ever used.
The r->port_end is set by the parser, though it was never used by
the following code (and was never usable, since not copied by the
ngx_http_alloc_large_header_buffer() without r->port_start set).
Currently, packets generated by ngx_quic_frame_sendto() and
ngx_quic_send_early_cc() are not logged, thus making it hard
to read logs due to gaps appearing in packet numbers sequence.
At frames level, it is handy to see immediately packet number
in which they arrived or being sent.
As part of normal HTTP/2 processing, incomplete frames are saved in the
control state using a fixed size memcpy of NGX_HTTP_V2_STATE_BUFFER_SIZE.
For this matter, two state buffers are reserved in the HTTP/2 recv buffer.
As part of HTTP/2 auto-detection on plain TCP connections, initial data
is first read into a buffer specified by the client_header_buffer_size
directive that doesn't have state reservation. Previously, this made it
possible to over-read the buffer as part of saving the state.
The fix is to read the available buffer size rather than a fixed size.
Although memcpy of a fixed size can produce a better optimized code,
handling of incomplete frames isn't a common execution path, so it was
sacrificed for the sake of simplicity of the fix.
It is made local as it is only needed now when creating crypto context.
BoringSSL lacks EVP interface for ChaCha20, providing instead
a function for one-shot encryption, thus hp is still preserved.
Based on a patch by Roman Arutyunyan.
Previously it was possible to generate ACK frames using formally discarded
protection keys, in particular, when acknowledging a client Handshake packet
used to complete the TLS handshake and to discard handshake protection keys.
As it happens late in packet processing, it could be possible to generate ACK
frames after the keys were already discarded.
ACK frames are generated from ngx_quic_ack_packet(), either using a posted
push event, which envolves ngx_quic_generate_ack() as a part of the final
packet assembling, or directly in ngx_quic_ack_packet(), such as when there
is no room to add a new ACK range or when the received packet is out of order.
The added keys availability check is used to avoid generating late ACK frames
in both cases.
In addition to triggering alert, it ensures that such packets won't be sent.
With the previous change that marks server keys as discarded by zeroing the
key lengh, it is now an error to send packets with discarded keys. OpenSSL
based stacks tolerate such behaviour because key length isn't used in packet
protection, but BoringSSL will raise the UNSUPPORTED_KEY_SIZE cipher error.
It won't be possible to use discarded keys with reused crypto contexts as it
happens in subsequent changes.
Keys may be released by TLS stack in different times, so it makes sense
to check this independently as well. This allows to fine-tune what key
direction is used when checking keys availability.
When discarding, server keys are now marked in addition to client keys.
This improves nginx startup times significantly when using very large number
of locations due to computational complexity of the sorting algorithm being
used: insertion sort is O(n*n) on average, while merge sort is O(n*log(n)).
In particular, in a test configuration with 20k locations total startup
time is reduced from 8 seconds to 0.9 seconds.
Prodded by Yusuke Nojima,
https://mailman.nginx.org/pipermail/nginx-devel/2023-September/NUL3Y2FPPFSHMPTFTL65KXSXNTX3NQMK.html
In ngx_regex_cleanup() allocator wasn't configured when calling
pcre2_compile_context_free() and pcre2_match_data_free(), resulting
in no ngx_free() call and leaked memory. Fix is ensure that allocator
is configured for global allocations, so that ngx_free() is actually
called to free memory.
Additionally, ngx_regex_compile_context was cleared in
ngx_regex_module_init(). It should be either not cleared, so it will
be freed by ngx_regex_cleanup(), or properly freed. Fix is to
not clear it, so ngx_regex_cleanup() will be able to free it.
Reported by ZhenZhong Wu,
https://mailman.nginx.org/pipermail/nginx-devel/2023-September/3Z5FIKUDRN2WBSL3JWTZJ7SXDA6YIWPB.html
To ensure that attempts to flood servers with many streams are detected
early, a limit of no more than 2 * max_concurrent_streams new streams per one
event loop iteration was introduced. This limit is applied even if
max_concurrent_streams is not yet reached - for example, if corresponding
streams are handled synchronously or reset.
Further, refused streams are now limited to maximum of max_concurrent_streams
and 100, similarly to priority_limit initial value, providing some tolerance
to clients trying to open several streams at the connection start, yet
low tolerance to flooding attempts.
The error may be triggered in add_handhshake_data() by incorrect transport
parameter sent by client. The expected behaviour in this case is to close
connection complaining about incorrect parameter. Currently the connection
just times out.
Previously, the timer was never reset due to an explicit check. The check was
added in 36b59521a41c as part of connection close simplification. The reason
was to retain the earliest timeout. However, the timeouts are all the same
while QUIC handshake is in progress and resetting the timer for the same value
has no performance implications. After handshake completion there's only
application level. The change removes the check.
Now the session object is assigned to c->data while ngx_http_connection_t
object is referenced by its http_connection field, similar to
ngx_http_v2_connection_t and ngx_http_request_t.
The change allows to eliminate v3_session field from ngx_http_connection_t.
The field was under NGX_HTTP_V3 macro, which was a source of binary
compatibility problems when nginx/module is build with/without HTTP/3 support.
Postponing is essential since c->data should retain the reference to
ngx_http_connection_t object throughout QUIC handshake, because SSL callbacks
ngx_http_ssl_servername() and ngx_http_ssl_alpn_select() rely on this.
Instead, when worker is shutting down and handshake is not yet completed,
connection is terminated immediately.
Previously the callback could be called while QUIC handshake was in progress
and, what's more important, before the init() callback. Now it's postponed
after init().
This change is a preparation to postponing HTTP/3 session creation to init().
Previously QUIC did not have such parameter and handshake duration was
controlled by HTTP/3. However that required creating and storing HTTP/3
session on first client datagram. Apparently there's no convenient way to
store the session object until QUIC handshake is complete. In the followup
patches session creation will be postponed to init() callback.
As explained in BoringSSL change[1], levels were introduced in the original
QUIC API to draw a line between when keys are released and when are active.
In the new QUIC API they are released in separate calls when it's needed.
BoringSSL has then a consideration to remove levels API, hence the change.
If not available e.g. from a QUIC packet header, levels can be taken based on
keys availability. The only real use of levels is to prevent using app keys
before they are active in QuicTLS that provides the old BoringSSL QUIC API,
it is replaced with an equivalent check of c->ssl->handshaked.
This change also removes OpenSSL compat shims since they are no longer used.
The only exception left is caching write level from the keylog callback in
the internal field which is a handy equivalent of checking keys availability.
[1] https://boringssl.googlesource.com/boringssl/+/1e859054
As per RFC 9000, section 10.2.3, to ensure that peer successfully removed
packet protection, CONNECTION_CLOSE can be sent in multiple packets using
different packet protection levels.
Now it is sent in all protection levels available.
This roughly corresponds to the following paragraph:
* Prior to confirming the handshake, a peer might be unable to process 1-RTT
packets, so an endpoint SHOULD send a CONNECTION_CLOSE frame in both Handshake
and 1-RTT packets. A server SHOULD also send a CONNECTION_CLOSE frame in an
Initial packet.
In practice, this change allows to avoid sending an Initial packet when we know
the client has handshake keys, by checking if we have discarded initial keys.
Also, this fixes sending CONNECTION_CLOSE when using QuicTLS with old QUIC API,
where TLS stack releases application read keys before handshake confirmation;
it is fixed by sending CONNECTION_CLOSE additionally in a Handshake packet.
Status header with an empty reason-phrase, such as "Status: 404 ", is
valid per CGI specification, but looses the trailing space during parsing.
Currently, this results in "HTTP/1.1 404" HTTP status line in the response,
which violates HTTP specification due to missing trailing space.
With this change, only the status code is used from such short Status
header lines, so nginx will generate status line itself, with the space
and appropriate reason phrase if available.
Reported at:
https://mailman.nginx.org/pipermail/nginx/2023-August/EX7G4JUUHJWJE5UOAZMO5UD6OJILCYGX.html
Previously, a socket error on a path being validated resulted in validation
error and subsequent QUIC connection closure. Now the error is ignored and
path validation proceeds as usual, with several retries and a timeout.
When validating the old path after an apparent migration, that path may already
be unavailable and sendmsg() may return an error, which should not result in
QUIC connection close.
When validating the new path, it's possible that the new client address is
spoofed (See RFC 9000, 9.3.2. On-Path Address Spoofing). This address may
as well be unavailable and should not trigger QUIC connection closure.
Previously, original dcid was used to receive initial client packets in case
server initial response was lost. However, last dcid should be used instead.
These two are the same unless retry is used. In case of retry, client resends
initial packet with a new dcid, that is different from the original dcid. If
server response is lost, the client resends this packet again with the same
dcid. This is shown in RFC 9000, 7.3. Authenticating Connection IDs, Figure 8.
The issue manifested itself with creating multiple server sessions in response
to each post-retry client initial packet, if server response is lost.
Since at least f9fbeb4ee0de and certainly after 924882f42dea, which
TLS Key Update support predates, queued data output is deferred to a
posted push handler. To address timing signals after these changes,
generating next keys is now posted to run after the push handler.
Previously, NGX_AGAIN returned by ngx_quic_send() was treated by
ngx_quic_frame_sendto() as error, which triggered errors in its callers.
However, a blocked socket is not an error. Now NGX_AGAIN is passed as is to
the ngx_quic_frame_sendto() callers, which can safely ignore it.
When probe timeout expired while congestion window was exhausted, probe PINGs
could not be sent. As a result, lost packets could not be declared lost and
congestion window could not be freed for new packets. This deadlock
continued until connection idle timeout expiration.
Now PINGs are sent separately from the frame queue without congestion control,
as specified by RFC 9002, Section 7:
An endpoint MUST NOT send a packet if it would cause bytes_in_flight
(see Appendix B.2) to be larger than the congestion window, unless the
packet is sent on a PTO timer expiration (see Section 6.2) or when entering
recovery (see Section 7.3.2).
Previously, PTO handler analyzed the first packet in the sent queue for the
timeout expiration. However, the last sent packet should be analyzed instead.
An example is timeout calculation in ngx_quic_set_lost_timer().
Previously the field pnum of a potentially freed frame was accessed. Now the
value is copied to a local variable. The old behavior did not cause any
problems since the frame memory is not freed, but is moved to a free queue
instead.
In non-GSO mode, a datagram is sent if congestion window is not exceeded by the
time of send. The window could be exceeded by a small amount after the send.
In GSO mode, congestion window was checked in a similar way, but for all
concatenated datagrams as a whole. This could result in exceeding congestion
window by a lot. Now congestion window is checked for every datagram in GSO
mode as well.
Previously it was added to the tail as all other frames. However, if the
amount of queued data is large, it could delay the delivery of ACK, which
could trigger frames retransmissions and slow down the connection.
Previously ACK was not generated if max_ack_delay was not yet expired and the
number of unacknowledged ack-eliciting packets was less than two, as allowed by
RFC 9000 13.2.1-13.2.2. However this only makes sense to avoid sending ACK-only
packets, as explained by the RFC:
On the other hand, reducing the frequency of packets that carry only
acknowledgments reduces packet transmission and processing cost at both
endpoints.
Now ACK is delayed only if output frame queue is empty. Otherwise ACK is sent
immediately, which significantly improves QUIC performance with certain tests.
With this change, the NGX_OPENSSL_NO_CONFIG macro is defined when nginx
is asked to build OpenSSL itself. And with this macro automatic loading
of OpenSSL configuration (from the build directory) is prevented unless
the OPENSSL_CONF environment variable is explicitly set.
Note that not loading configuration is broken in OpenSSL 1.1.1 and 1.1.1a
(fixed in OpenSSL 1.1.1b, see https://github.com/openssl/openssl/issues/7350).
If nginx is used to compile these OpenSSL versions, configuring nginx with
NGX_OPENSSL_NO_CONFIG explicitly set to 0 might be used as a workaround.
Following OpenSSL 0.9.8f, OpenSSL tries to load application-specific
configuration section first, and then falls back to the "openssl_conf"
default section if application-specific section is not found, by using
CONF_modules_load_file(CONF_MFLAGS_DEFAULT_SECTION). Therefore this
change is not expected to introduce any compatibility issues with existing
configurations. It does, however, make it easier to configure specific
OpenSSL settings for nginx in system-wide OpenSSL configuration
(ticket #2449).
Instead of checking OPENSSL_VERSION_NUMBER when using the OPENSSL_init_ssl()
interface, the code now tests for OPENSSL_INIT_LOAD_CONFIG to be defined and
true, and also explicitly excludes LibreSSL. This ensures that this interface
is not used with BoringSSL and LibreSSL, which do not provide additional
library initialization settings, notably the OPENSSL_INIT_set_config_appname()
call.
Similarly to 6822:c045b4926b2c, environment variables introduced with
the "env" directive (and "NGINX_BPF_MAPS" added by QUIC) are now allocated
via ngx_alloc(), and explicitly freed by a cleanup handler if no longer used.
In collaboration with Sergey Kandaurov.
Previously used constant EVP_GCM_TLS_TAG_LEN had misleading name since it was
used not only with GCM, but also with CHACHAPOLY. Now a new constant
NGX_QUIC_TAG_LEN introduced. Luckily all AEAD algorithms used by QUIC have
the same tag length of 16.
Previously, computing rttvar used an updated smoothed_rtt value as per
RFC 9002, section 5.3, which appears to be specified in a wrong order.
A technical errata ID 7539 is reported.
Although it has better implementation status than HTTP/3 server push,
it remains of limited use, with adoption numbers seen as negligible.
Per IETF 102 materials, server push was used only in 0.04% of sessions.
It was considered to be "difficult to use effectively" in RFC 9113.
Its use is further limited by badly matching to fetch/cache/connection
models in browsers, see related discussions linked from [1].
Server push was disabled in Chrome 106 [2].
The http2_push, http2_push_preload, and http2_max_concurrent_pushes
directives are made obsolete. In particular, this essentially reverts
7201:641306096f5b and 7207:3d2b0b02bd3d.
[1] https://jakearchibald.com/2017/h2-push-tougher-than-i-thought/
[2] https://chromestatus.com/feature/6302414934114304
It has been deprecated since 7270:46c0c7ef4913 (1.15.0) in favour of
the "ssl" parameter of the "listen" directive, which has been available
since 2224:109849282793 (0.7.14).
The directive enables HTTP/2 in the current server. The previous way to
enable HTTP/2 via "listen ... http2" is now deprecated. The new approach
allows to share HTTP/2 and HTTP/0.9-1.1 on the same port.
For SSL connections, HTTP/2 is now selected by ALPN callback based on whether
the protocol is enabled in the virtual server chosen by SNI. This however only
works since OpenSSL 1.0.2h, where ALPN callback is invoked after SNI callback.
For older versions of OpenSSL, HTTP/2 is enabled based on the default virtual
server configuration.
For plain TCP connections, HTTP/2 is now auto-detected by HTTP/2 preface, if
HTTP/2 is enabled in the default virtual server. If preface is not matched,
HTTP/0.9-1.1 is assumed.
Previously, rec.level field was not uninitialized in SSL_provide_quic_data().
As a result, its value was always ssl_encryption_initial. Later in
ngx_quic_ciphers() such level resulted in resetting the cipher to
TLS1_3_CK_AES_128_GCM_SHA256 and using AES128 to encrypt the packet.
Now the level is initialized and the right cipher is used.
The layer is enabled as a fallback if the QUIC support is configured and the
BoringSSL API wasn't detected, or when using the --with-openssl option, also
compatible with QuicTLS and LibreSSL. For the latter, the layer is assumed
to be present if QUIC was requested, so it needs to be undefined to prevent
QUIC API redefinition as appropriate.
A previously used approach to test the TLSEXT_TYPE_quic_transport_parameters
macro doesn't work with OpenSSL 3.2 master branch where this macro appeared
with incompatible QUIC API. To fix the build there, the test is revised to
pass only for QuicTLS and LibreSSL.
Previously, ngx_quic_close_connection() could be called in a way that QUIC
connection was accessed after the call. In most cases the connection is not
closed right away, but close timeout is scheduled. However, it's not always
the case. Also, if the close process started earlier for a different reason,
calling ngx_quic_close_connection() may actually close the connection. The
connection object should not be accessed after that.
Now, when possible, return statement is added to eliminate post-close connection
object access. In other places ngx_quic_close_connection() is substituted with
posting close event.
Also, the new way of closing connection in ngx_quic_stream_cleanup_handler()
fixes another problem in this function. Previously it passed stream connection
instead of QUIC connection to ngx_quic_close_connection(). This could result
in incomplete connection shutdown. One consequence of that could be that QUIC
streams were freed without shutting down their application contexts. This could
result in another use-after-free.
Found by Coverity (CID 1530402).
The qsock->sockaddr field is a ngx_sockaddr_t union, and therefore can hold
any sockaddr (and union members, such qsock->sockaddr.sockaddr, can be used
to access appropriate variant of the sockaddr). It is better to set it via
qsock->sockaddr itself though, and not qsock->sockaddr.sockaddr, so static
analyzers won't complain about out-of-bounds access.
Prodded by Coverity (CID 1530403).
Previously, ngx_udp_rbtree_insert_value() was used for plain UDP and
ngx_quic_rbtree_insert_value() was used for QUIC. Because of this it was
impossible to initialize connection tree in ngx_create_listening() since
this function is not aware what kind of listening it creates.
Now ngx_udp_rbtree_insert_value() is used for both QUIC and UDP. To make
is possible, a generic key field is added to ngx_udp_connection_t. It keeps
client address for UDP and connection ID for QUIC.
The directive used to set the value of the "max_udp_payload_size" transport
parameter. According to RFC 9000, Section 18.2, the value specifies the size
of buffer for reading incoming datagrams:
This limit does act as an additional constraint on datagram size in
the same way as the path MTU, but it is a property of the endpoint
and not the path; see Section 14. It is expected that this is the
space an endpoint dedicates to holding incoming packets.
Current QUIC implementation uses the maximum possible buffer size (65527) for
reading datagrams.
HTTP and Stream variables $remote_addr and $binary_remote_addr rely on
constant client address, particularly because they are cacheable.
However, QUIC client may migrate to a new address. While there's no perfect
way to handle this, the proposed solution is to copy client address to QUIC
stream at stream creation.
The change also fixes truncated $remote_addr if migration happened while the
stream was active. The reason is addr_text string was copied to stream by
value.
The existing logic to evaluate multi header "$sent_http_*" variables,
such as $sent_http_cache_control, as previously introduced in 1.23.0,
doesn't take into account that one or more elements can be cleared,
yet still present in a linked list, pointed to by the next field.
Such elements don't contribute to the resulting variable length, an
attempt to append a separator for them ends up in out of bounds write.
This is not possible with standard modules, though at least one third
party module is known to override multi header values this way, so it
makes sense to harden the logic.
The fix restores a generic boundary check.
Previously, the value was not set and remained zero. While in nginx code the
value of c->sockaddr is accessed without taking c->socklen into account,
invalid c->socklen could lead to unexpected results in third-party modules.
Previously, the post-migration value of addr_text could be truncated, if
it was longer than the previous one. Also, the new value always included
port, which should not be there.
According to RFC 9000, 8.2.4. Failed Path Validation,
the following value is recommended as a validation timeout:
A value of three times the larger of the current PTO
or the PTO for the new path (using kInitialRtt, as
defined in [QUIC-RECOVERY]) is RECOMMENDED.
The change adds PTO of the new path to the equation as the lower bound.
Path validation packets containing PATH_CHALLENGE frames are sent separately
from regular frame queue, because of the need to use a decicated path and pad
the packets. The packets are sent periodically, separately from the regular
probe/lost detection mechanism. A path validation packet is resent up to 3
times, each time after PTO expiration, with increasing per-path PTO backoff.
The check is needed for clients in order to unblock a server due to
anti-amplification limits, and it seems to make no sense for servers.
See RFC 9002, A.6 and A.8 for a further explanation.
This makes max_ack_delay to now always account, notably including
PATH_CHALLENGE timers as noted in the last paragraph of 9000, 9.4,
unlike when it was only used when there are packets in flight.
While here, fixed nearby style.
Previously, ssl_encryption_application was hardcoded. Before 9553eea74f2a,
ngx_quic_frame_sendto() was used only for PATH_CHALLENGE/PATH_RESPONSE sent
at the application level only. Since 9553eea74f2a, ngx_quic_frame_sendto()
is also used for CONNECTION_CLOSE, which can be sent at initial level after
SSL handshake error or rejection. This resulted in packet encryption error.
Now level is copied from frame, which fixes the error.
Previously, before sending CONNECTION_CLOSE to client, all pending frames
were sent. This is redundant and could prevent CONNECTION_CLOSE from being
sent due to congestion control. Now pending frames are freed and
CONNECTION_CLOSE is sent without congestion control, as advised by RFC 9002:
Packets containing frames besides ACK or CONNECTION_CLOSE frames
count toward congestion control limits and are considered to be in flight.
Do not corrupt frame data chain pointer on ngx_quic_read_buffer() error.
The error leads to closing a QUIC connection where the frame may be used
as part of the QUIC connection tear down, which envolves writing pending
frames, including this one.
The rcf->studies list is unconditionally accessed by ngx_regex_cleanup(),
and this used to cause NULL pointer dereference if allocation
failed. Fix is to set cleanup handler only when allocation succeeds.
Previously, waiting on a shared connection was not allowed, because the only
type of such connection was plain UDP. However, QUIC stream connections are
also shared since they share socket descriptor with the listen connection.
Meanwhile, it's perfectly normal to wait on such connections.
The issue manifested itself with stream write errors when the amount of data
exceeded stream buffer size or flow control. Now no error is triggered
and Stream write module is allowed to wait for buffer space to become available.
When a stream is created by client, it's often the case that nginx will send
immediate response on that stream. An example is HTTP/3 request stream, which
in most cases quickly replies with at least HTTP headers.
QUIC stream init handlers are called from a posted event. Output QUIC
frames are also sent to client from a posted event, called the push event.
If the push event is posted before the stream init event, then output produced
by stream may trigger sending an extra UDP datagram. To address this, push
event is now re-posted when a new stream init event is posted.
An example is handling 0-RTT packets. Client typically sends an init packet
coalesced with a 0-RTT packet. Previously, nginx replied with a padded CRYPTO
datagram, followed by a 1-RTT stream reply datagram. Now CRYPTO and STREAM
packets are coalesced in one reply datagram, which saves bandwidth.
Other examples include coalescing 1-RTT first stream response, and
MAX_STREAMS/STREAM sent in response to ACK/STREAM.
It now uses custom alloc_aligned() wrapper for all allocations,
therefore all allocations are larger than expected by (64 + sizeof(void*)).
Further, they are seen as allocations of 1 element. Relevant calculations
were adjusted to reflect this, and state allocation is now protected
with a flag to avoid misinterpreting other allocations as the zlib
deflate_state allocation.
Further, it no longer forces window bits to 13 on compression level 1,
so the comment was adjusted to reflect this.
When establishing a connection to the backend, nginx blocks reading
from the client with ngx_mail_proxy_block_read(). Previously, such
events were lost, and in some cases this resulted in connection hangs.
Notably, this affected mail_imap_ssl.t on Windows, since the test
closes connections after requesting authentication, but without
waiting for any responses (so the connection close events might be
lost).
Fix is to post an event to read from the client after connecting to
the backend if there were blocked events.
SSL context is not present if the default server has neither certificates nor
ssl_reject_handshake enabled. Previously, this led to null pointer dereference
before it would be caught with configuration checks.
Additionally, non-default servers with distinct SSL contexts need to initialize
compatibility layer in order to complete a QUIC handshake.
This ensures that errors which happen during logging to syslog are logged
with proper context, such as "while logging to syslog" and the server name.
Prodded by Safar Safarly.
During initial startup the ngx_cycle->hostname is not available, and
previously this resulted in incorrect logging. Instead, hostname from the
configuration being parsed is now preserved in the syslog peer structure
and then used during logging.
Similarly, ngx_cycle->log might not match the configuration where the
syslog peer is defined if the configuration is not yet fully applied,
and previously this resulted in unexpected logging of syslog errors
and debug information. Instead, cf->cycle->new_log is now referenced
in the syslog peer structure and used for logging, similarly to how it
is done in other modules.
Similarly to ticket #274 (7354:1812f1d79d84), early request finalization
without calling ngx_http_run_posted_requests() resulted in a connection
hang (a socket leak) if the 400 (Bad Request) error was generated in
ngx_http_v2_state_process_header() due to invalid request headers and
"return 444" was used in error_page 400.
As tested with tlsfuzzer with LibreSSL 3.7.0, the following errors are
certainly client-related:
SSL_do_handshake() failed (SSL: error:14026073:SSL routines:ACCEPT_SR_CLNT_HELLO:bad packet length)
SSL_do_handshake() failed (SSL: error:1402612C:SSL routines:ACCEPT_SR_CLNT_HELLO:ssl3 session id too long)
SSL_do_handshake() failed (SSL: error:140380EA:SSL routines:ACCEPT_SR_KEY_EXCH:tls rsa encrypted value length is wrong)
Accordingly, the SSL_R_BAD_PACKET_LENGTH ("bad packet length"),
SSL_R_SSL3_SESSION_ID_TOO_LONG ("ssl3 session id too long"),
SSL_R_TLS_RSA_ENCRYPTED_VALUE_LENGTH_IS_WRONG ("tls rsa encrypted value
length is wrong") errors are now logged at the "info" level.
To further differentiate client-related errors and adjust logging levels
of various SSL errors, nginx was tested with tlsfuzzer with multiple
OpenSSL versions (3.1.0-beta1, 3.0.8, 1.1.1t, 1.1.0l, 1.0.2u, 1.0.1u,
1.0.0s, 0.9.8zh).
The following errors were observed during tlsfuzzer runs with OpenSSL 3.0.8,
and are clearly client-related:
SSL_do_handshake() failed (SSL: error:0A000092:SSL routines::data length too long)
SSL_do_handshake() failed (SSL: error:0A0000A0:SSL routines::length too short)
SSL_do_handshake() failed (SSL: error:0A000124:SSL routines::bad legacy version)
SSL_do_handshake() failed (SSL: error:0A000178:SSL routines::no shared signature algorithms)
Accordingly, the SSL_R_DATA_LENGTH_TOO_LONG ("data length too long"),
SSL_R_LENGTH_TOO_SHORT ("length too short"), SSL_R_BAD_LEGACY_VERSION
("bad legacy version"), and SSL_R_NO_SHARED_SIGNATURE_ALGORITHMS
("no shared signature algorithms", misspelled as "sigature" in OpenSSL 1.0.2)
errors are now logged at the "info" level.
Additionally, the following errors were observed with OpenSSL 3.0.8 and
with TLSv1.3 enabled:
SSL_do_handshake() failed (SSL: error:0A00006F:SSL routines::bad digest length)
SSL_do_handshake() failed (SSL: error:0A000070:SSL routines::missing sigalgs extension)
SSL_do_handshake() failed (SSL: error:0A000096:SSL routines::encrypted length too long)
SSL_do_handshake() failed (SSL: error:0A00010F:SSL routines::bad length)
SSL_read() failed (SSL: error:0A00007A:SSL routines::bad key update)
SSL_read() failed (SSL: error:0A000125:SSL routines::mixed handshake and non handshake data)
Accordingly, the SSL_R_BAD_DIGEST_LENGTH ("bad digest length"),
SSL_R_MISSING_SIGALGS_EXTENSION ("missing sigalgs extension"),
SSL_R_ENCRYPTED_LENGTH_TOO_LONG ("encrypted length too long"),
SSL_R_BAD_LENGTH ("bad length"), SSL_R_BAD_KEY_UPDATE ("bad key update"),
and SSL_R_MIXED_HANDSHAKE_AND_NON_HANDSHAKE_DATA ("mixed handshake and non
handshake data") errors are now logged at the "info" level.
Additionally, the following errors were observed with OpenSSL 1.1.1t:
SSL_do_handshake() failed (SSL: error:14094091:SSL routines:ssl3_read_bytes:data between ccs and finished)
SSL_do_handshake() failed (SSL: error:14094199:SSL routines:ssl3_read_bytes:too many warn alerts)
SSL_read() failed (SSL: error:1408F0C6:SSL routines:ssl3_get_record:packet length too long)
SSL_read() failed (SSL: error:14094085:SSL routines:ssl3_read_bytes:ccs received early)
Accordingly, the SSL_R_CCS_RECEIVED_EARLY ("ccs received early"),
SSL_R_DATA_BETWEEN_CCS_AND_FINISHED ("data between ccs and finished"),
SSL_R_PACKET_LENGTH_TOO_LONG ("packet length too long"), and
SSL_R_TOO_MANY_WARN_ALERTS ("too many warn alerts") errors are now logged
at the "info" level.
Additionally, the following errors were observed with OpenSSL 1.0.2u:
SSL_do_handshake() failed (SSL: error:1407612A:SSL routines:SSL23_GET_CLIENT_HELLO:record too small)
SSL_do_handshake() failed (SSL: error:1408C09A:SSL routines:ssl3_get_finished:got a fin before a ccs)
Accordingly, the SSL_R_RECORD_TOO_SMALL ("record too small") and
SSL_R_GOT_A_FIN_BEFORE_A_CCS ("got a fin before a ccs") errors are now
logged at the "info" level.
No additional client-related errors were observed while testing with
OpenSSL 3.1.0-beta1, OpenSSL 1.1.0l, OpenSSL 1.0.1u, OpenSSL 1.0.0s,
and OpenSSL 0.9.8zh.
In some cases there might be multiple errors in the OpenSSL error queue,
notably when a libcrypto call fails, and then the SSL layer generates
an error itself. For example, the following errors were observed
with OpenSSL 3.0.8 with TLSv1.3 enabled:
SSL_do_handshake() failed (SSL: error:02800066:Diffie-Hellman routines::invalid public key error:0A000132:SSL routines::bad ecpoint)
SSL_do_handshake() failed (SSL: error:08000066:elliptic curve routines::invalid encoding error:0A000132:SSL routines::bad ecpoint)
SSL_do_handshake() failed (SSL: error:0800006B:elliptic curve routines::point is not on curve error:0A000132:SSL routines::bad ecpoint)
In such cases it seems to be better to determine logging level based on
the last error in the error queue (the one added by the SSL layer,
SSL_R_BAD_ECPOINT in all of the above example example errors). To do so,
the ngx_ssl_connection_error() function was changed to use
ERR_peek_last_error().
An UTF-8 octet sequence cannot start with a 11111xxx byte (above 0xf8),
see https://datatracker.ietf.org/doc/html/rfc3629#section-3. Previously,
such bytes were accepted by ngx_utf8_decode() and misinterpreted as 11110xxx
bytes (as in a 4-byte sequence). While unlikely, this can potentially cause
issues.
Fix is to explicitly reject such bytes in ngx_utf8_decode().
Just a drive letter might not correctly represent file system being used,
notably when using symlinks (as created by "mklink /d"). As such, instead
of trying to call GetDiskFreeSpace() with just a drive letter, we now always
use GetDiskFreeSpace() with full path.
Further, it looks like the code to use just a drive letter never worked,
since it tried to test name[2] instead of name[1] to be ':'.
This ensures that ngx_win32_rename_file() will support non-ASCII names
when supported by the wrappers.
Notably, this is used by PUT requests in the dav module when overwriting
existing files with non-ASCII names (ticket #1433).
Previously, ngx_win32_rename_file() retried on all errors returned by
MoveFile() to a temporary name. It only make sense, however, to retry
when the destination file already exists, similarly to the condition
when ngx_win32_rename_file() is called. Retrying on other errors is
meaningless and might result in an infinite loop.
This makes it possible to create directories under prefix with non-ASCII
characters, as well as makes it possible to create directories with non-ASCII
characters when using the dav module (ticket #1433).
To ensure that the dav module operations are restricted similarly to
other file operations (in particular, short names are not allowed), the
ngx_win32_check_filename() function is used. It improved to support
checking of just dirname, and now can be used to check paths when creating
files or directories.
Notably, ngx_open_dir() now supports opening directories with non-ASCII
characters, and directory entries returned by ngx_read_dir() are properly
converted to UTF-8.
To ensure proper target selection the NGX_MACHINE variable is now set
based on the MSVC compiler output, and the OpenSSL target is set based
on it.
This is not important as long as "no-asm" is used (as in misc/GNUmakefile
and win32 build instructions), but might be beneficial if someone is trying
to build OpenSSL with assembler code.
Previously, NGX_MACHINE was not set when crossbuilding, resulting in
NGX_ALIGNMENT=16 being used in 32-bit builds (if not explicitly set to a
correct value). This in turn might result in memory corruption in
ngx_palloc() (as there are no usable aligned allocator on Windows, and
normal malloc() is used instead, which provides 8 byte alignment on
32-bit platforms).
To fix this, now i386 machine is set when crossbuilding, so nginx won't
assume strict alignment requirements.
Output examples in English, Russian, and Spanish:
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86
Оптимизирующий 32-разрядный компилятор Microsoft (R) C/C++ версии 16.00.30319.01 для 80x86
Compilador de optimización de C/C++ de Microsoft (R) versión 16.00.30319.01 para x64
Since most of the words are translated, instead of looking for the words
"Compiler Version" we now search for "C/C++" and the version number.
This is expected to help with clients using pipelining with some constant
depth, such as apt[1][2].
When downloading many resources, apt uses pipelining with some constant
depth, a number of requests in flight. This essentially means that after
receiving a response it sends an additional request to the server, and
this can result in requests arriving to the server at any time. Further,
additional requests are sent one-by-one, and can be easily seen as such
(neither as pipelined, nor followed by pipelined requests).
The only safe approach to close such connections (for example, when
keepalive_requests is reached) is with lingering. To do so, now nginx
monitors if pipelining was used on the connection, and if it was, closes
the connection with lingering.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=973861#10
[2] https://mailman.nginx.org/pipermail/nginx-devel/2023-January/ZA2SP5SJU55LHEBCJMFDB2AZVELRLTHI.html
Since 4611:2b6cb7528409 responses from the gzip static, flv, and mp4 modules
can be used with subrequests, though empty files were not properly handled.
Empty gzipped, flv, and mp4 files thus resulted in "zero size buf in output"
alerts. While valid corresponding files are not expected to be empty, such
files shouldn't result in alerts.
Fix is to set b->sync on such empty subrequest responses, similarly to what
ngx_http_send_special() does.
Additionally, the static module, the ngx_http_send_response() function, and
file cache are modified to do the same instead of not sending the response
body at all in such cases, since not sending the response body at all is
believed to be at least questionable, and might break various filters
which do not expect such behaviour.
The "listen" directive in the http module can be used multiple times
in different server blocks. Originally, it was supposed to be specified
once with various socket options, and without any parameters in virtual
server blocks. For example:
server { listen 80 backlog=1024; server_name foo; ... }
server { listen 80; server_name bar; ... }
server { listen 80; server_name bazz; ... }
The address part of the syntax ("address[:port]" / "port" / "unix:path")
uniquely identifies the listening socket, and therefore is enough for
name-based virtual servers (to let nginx know that the virtual server
accepts requests on the listening socket in question).
To ensure that listening options do not conflict between virtual servers,
they were allowed only once. For example, the following configuration
will be rejected ("duplicate listen options for 0.0.0.0:80 in ..."):
server { listen 80 backlog=1024; server_name foo; ... }
server { listen 80 backlog=512; server_name bar; ... }
At some point it was, however, noticed, that it is sometimes convenient
to repeat some options for clarity. In nginx 0.8.51 the "ssl" parameter
was allowed to be specified multiple times, e.g.:
server { listen 443 ssl backlog=1024; server_name foo; ... }
server { listen 443 ssl; server_name bar; ... }
server { listen 443 ssl; server_name bazz; ... }
This approach makes configuration more readable, since SSL sockets are
immediately visible in the configuration. If this is not needed, just the
address can still be used.
Later, additional protocol-specific options similar to "ssl" were
introduced, notably "http2" and "proxy_protocol". With these options,
one can write:
server { listen 443 ssl backlog=1024; server_name foo; ... }
server { listen 443 http2; server_name bar; ... }
server { listen 443 proxy_protocol; server_name bazz; ... }
The resulting socket will use ssl, http2, and proxy_protocol, but this
is not really obvious from the configuration.
To emphasize such misleading configurations are discouraged, nginx now
warns as long as the "listen" directive is used with options different
from the options previously used if this is potentially confusing.
In particular, the following configurations are allowed:
server { listen 8401 ssl backlog=1024; server_name foo; }
server { listen 8401 ssl; server_name bar; }
server { listen 8401 ssl; server_name bazz; }
server { listen 8402 ssl http2 backlog=1024; server_name foo; }
server { listen 8402 ssl; server_name bar; }
server { listen 8402 ssl; server_name bazz; }
server { listen 8403 ssl; server_name bar; }
server { listen 8403 ssl; server_name bazz; }
server { listen 8403 ssl http2; server_name foo; }
server { listen 8404 ssl http2 backlog=1024; server_name foo; }
server { listen 8404 http2; server_name bar; }
server { listen 8404 http2; server_name bazz; }
server { listen 8405 ssl http2 backlog=1024; server_name foo; }
server { listen 8405 ssl http2; server_name bar; }
server { listen 8405 ssl http2; server_name bazz; }
server { listen 8406 ssl; server_name foo; }
server { listen 8406; server_name bar; }
server { listen 8406; server_name bazz; }
And the following configurations will generate warnings:
server { listen 8501 ssl http2 backlog=1024; server_name foo; }
server { listen 8501 http2; server_name bar; }
server { listen 8501 ssl; server_name bazz; }
server { listen 8502 backlog=1024; server_name foo; }
server { listen 8502 ssl; server_name bar; }
server { listen 8503 ssl; server_name foo; }
server { listen 8503 http2; server_name bar; }
server { listen 8504 ssl; server_name foo; }
server { listen 8504 http2; server_name bar; }
server { listen 8504 proxy_protocol; server_name bazz; }
server { listen 8505 ssl http2 proxy_protocol; server_name foo; }
server { listen 8505 ssl http2; server_name bar; }
server { listen 8505 ssl; server_name bazz; }
server { listen 8506 ssl http2; server_name foo; }
server { listen 8506 ssl; server_name bar; }
server { listen 8506; server_name bazz; }
server { listen 8507 ssl; server_name bar; }
server { listen 8507; server_name bazz; }
server { listen 8507 ssl http2; server_name foo; }
server { listen 8508 ssl; server_name bar; }
server { listen 8508; server_name bazz; }
server { listen 8508 ssl backlog=1024; server_name foo; }
server { listen 8509; server_name bazz; }
server { listen 8509 ssl; server_name bar; }
server { listen 8509 ssl backlog=1024; server_name foo; }
The basic idea is that at most two sets of protocol options are allowed:
the main one (with socket options, if any), and a shorter one, with options
being a subset of the main options, repeated for clarity. As long as the
shorter set of protocol options is used, all listen directives except the
main one should use it.
Now "listen" directve has a new "quic" parameter which enables QUIC protocol
for the address. Further, to enable HTTP/3, a new directive "http3" is
introduced. The hq-interop protocol is enabled by "http3_hq" as before.
Now application protocol is chosen by ALPN.
Previously used "http3" parameter of "listen" is deprecated.
A QUIC handshake failure breaks down into several cases:
- a handshake error which leads to a send_alert call
- an error triggered by the add_handshake_data callback
- internal errors (allocation etc)
Previously, in the first case, only error code was set in the send_alert
callback. Now the "handshake failed" reason phrase is set there as well.
In the second case, both code and reason are set by add_handshake_data.
In the last case, setting reason phrase is removed: returning NGX_ERROR
now leads to closing the connection with just INTERNAL_ERROR.
Reported by Jiuzhou Cui.
Previously, since 3550b00d9dc8, the token was allocated on stack, to get
rid of pool usage. Now the token is allocated by ngx_quic_copy_buffer()
in QUIC buffers, also used for STREAM, CRYPTO and ACK frames.
Previously, location prefix length in ngx_http_location_tree_node_t was
stored as "u_char", and therefore location prefixes longer than 255 bytes
were handled incorrectly.
Fix is to use "u_short" instead. With "u_short", prefixes up to 65535 bytes
can be safely used, and this isn't reachable due to NGX_CONF_BUFFER, which
is 4096 bytes.
In contrast to on-the-fly gzipping with gzip filter, static gzipped
representation as returned by gzip_static is persistent, and therefore
the same binary representation is available for future requests, making
it possible to use range requests.
Further, if a gzipped representation is re-generated with different
compression settings, it is expected to result in different ETag and
different size reported in the Content-Range header, making it possible
to safely use range requests anyway.
As such, ranges are now allowed for files returned by gzip_static.
In nginx source code the inttypes.h include, if available, is used to define
standard integer types. Changed the SO_COOKIE configure test to follow this.
Specifically, now it is kept unset until streams are initialized.
Notably, this unbreaks OCSP with client certificates after 35e27117b593.
Previously, the read event could be posted prematurely via ngx_quic_set_event()
e.g., as part of handling a STREAM frame.
Previously, streams were initialized in early keys handler. However, client
transport parameters may not be available by then. This happens, for example,
when using QuicTLS. Now streams are initialized in ngx_quic_crypto_input()
after calling SSL_do_handshake() for both 0-RTT and 1-RTT.
Previously, there was no timeout for a request stream blocked on insert count,
which could result in infinite wait. Now client_header_timeout is set when
stream is first blocked.
Now, when RESET_STREAM is sent or received, or when streams are closed,
stream connection error flag is set. Previously, only stream state was
changed, which resulted in setting the error flag only after calling
recv()/send()/send_chain(). However, there are cases when none of these
functions is called, but it's still important to know if the stream is being
closed. For example, when an HTTP/3 request stream is blocked on insert count,
receiving RESET_STREAM should trigger stream closure, which was not the case.
The change also fixes ngx_http_upstream_check_broken_connection() and
ngx_http_test_reading() with QUIC streams.
Previously, stream events were added and deleted by ngx_handle_read_event() and
ngx_handle_write_event() in a way similar to level-triggered events. However,
QUIC stream events are effectively edge-triggered and can stay active all time.
Moreover, the events are now active since the moment a stream is created.
Previously, start_time wasn't set for a new stream.
The fix is to derive it from the parent connection.
Also it's used to simplify tracking keepalive_time.
As per RFC 9204, section 3.2.2, a new entry can reference an entry in the
dynamic table that will be evicted when adding this new entry into the dynamic
table.
Previously, such inserts resulted in use-after-free since the old entry was
evicted before the insertion (ticket #2431). Now it's evicted after the
insertion.
This change fixes Insert with Name Reference and Duplicate encoder instructions.
Ports difference must be respected when checking addresses for duplicates,
otherwise configurations like this are broken:
listen 127.0.0.1:6000-6005
It was broken by 4cc2bfeff46c (nginx 1.23.3).
Previously, keepalive timer was deleted in ngx_http_v3_wait_request_handler()
and set in request cleanup handler. This worked for HTTP/3 connections, but not
for hq connections. Now keepalive timer is deleted in
ngx_http_v3_init_request_stream() and set in connection cleanup handler,
which works both for HTTP/3 and hq.
It's called after handshake completion or prior to the first early data stream
creation. The callback should initialize application-level data before
creating streams.
HTTP/3 callback implementation sets keepalive timer and sends SETTINGS.
Also, this allows to limit max handshake time in ngx_http_v3_init_stream().
All these events are created in context of a client connection and are deleted
when the connection is closed. Setting ev->cancelable could trigger premature
connection closure and a socket leak alert.
Now main QUIC connection for HTTP/3 always has c->idle flag set. This allows
the connection to receive worker shutdown notification. It is passed to
application level via a new conf->shutdown() callback.
The HTTP/3 shutdown callback sends GOAWAY to client and gracefully shuts down
the QUIC connection.
Previously, stream was kept alive until all its data is sent. This resulted
in disabling retransmission of final part of stream when QUIC connection
was closed right after closing stream connection.
The connection is automatically switched to this mode by transport layer when
there are no non-cancelable streams. Currently, cancelable streams are
HTTP/3 encoder/decoder/control streams.
Previously, ngx_quic_finalize_connection() closed the connection with NGX_ERROR
code, which resulted in immediate connection closure. Now the code is NGX_OK,
which provides a more graceful shutdown with a timeout.
Previously, zero was used for this purpose. However, NGX_QUIC_ERR_NO_ERROR is
zero too. As a result, NGX_QUIC_ERR_NO_ERROR was changed to
NGX_QUIC_ERR_INTERNAL_ERROR when closing a QUIC connection.
If a client packet carrying a stream data frame is not acked due to packet loss,
the stream data is retransmitted later by client. It's also possible that the
retransmitted range is bigger than before due to more stream data being
available by then. If the original data was read out by the application,
there would be no read event triggered by the retransmitted frame, even though
it contains new data.
They are not supported by MSVC till 2012.
SSL_QUIC_METHOD initialization is moved to run-time to preserve portability
among SSL library implementations, which allows to reduce its visibility.
Note using of a static storage to keep SSL_set_quic_method() reference valid.
Previously, ngx_quic_hkdf_t variables used declaration with assignment
in the middle of a function, which is not supported by MSVC 2010.
Fixing this also required to rewrite the ngx_quic_hkdf_set macro
and to switch to an explicit array size.
Previously, HTTP/3 stream connection didn't inherit the servername regex
from the main QUIC connection saved when processing SNI and using regular
expressions in server names. As a result, it didn't execute to set regex
captures when choosing the virtual server while parsing HTTP/3 headers.
The type field was added in 7999d3fbb765 at early stages of QUIC implementation
and was not initialized for default listen. Missing initialization resulted in
default listen socket creation error.
SSL_CIPHER_get_protocol_id() appeared in BoringSSL somewhere between
BORINGSSL_API_VERSION 12 and 13 for compatibility with OpenSSL 1.1.1.
It was adopted without a proper macro test, which remained unnoticed.
This justifies that such old BoringSSL API isn't widely used and its
support can be dropped.
While here, removed SSL_set_quic_use_legacy_codepoint() that became
useless after the default was flipped in BoringSSL over a year ago.
Setting QUIC methods is converted to use C99 designated initializers
for simplicity, as LibreSSL 3.6.0 has different SSL_QUIC_METHOD layout.
Additionally, only set_read_secret/set_write_secret callbacks are set.
Although they are preferred in LibreSSL over set_encryption_secrets,
better be on a safe side as LibreSSL has unexpectedly incompatible
set_encryption_secrets calling convention expressed in passing read
and write secrets split in separate calls, unlike this is documented
in old BoringSSL sources. To avoid introducing further changes for
the old API, it is simply disabled.
This function is present in QuicTLS only. After SSL_READ_EARLY_DATA_SUCCESS
became visible in LibreSSL together with experimental QUIC API, this required
to revise the conditional compilation test to use more narrow macros.
After BoringSSL aligned[1] with OpenSSL on TLS1_3_CK_* macros, and
LibreSSL uses OpenSSL naming, our own variants can be dropped now.
Compatibility is preserved with libraries that lack these macros.
Additionally, transition to SSL_CIPHER_get_id() fixes build error
with LibreSSL that doesn't implement SSL_CIPHER_get_protocol_id().
[1] https://boringssl.googlesource.com/boringssl/+/dfddbc4ded
When client DATA frame header and its content come in different QUIC packets,
it may happen that only the header is processed by the first
ngx_http_v3_request_body_filter() call. In this case an empty request body
buffer is added to r->request_body->bufs, which is later reused in a
subsequent ngx_http_v3_request_body_filter() call without being removed from
the body chain. As a result, rb->request_body->bufs ends up with two copies of
the same buffer.
The fix is to avoid adding empty request body buffers to r->request_body->bufs.
Previously, QUIC used the existing UDP framework, which was created for UDP in
Stream. However the way QUIC connections are created and looked up is different
from the way UDP connections in Stream are created and looked up. Now these
two implementations are decoupled.
Previously, last buffer was tracked by keeping a pointer to the previous
chain link "next" field. When the previous buffer was split and then removed,
the pointer was no longer valid. Writing at this pointer resulted in broken
data chains.
Now last buffer is tracked by keeping a direct pointer to it.
The object is used instead of ngx_chain_t pointer for buffer operations like
ngx_quic_write_chain() and ngx_quic_read_chain(). These functions are renamed
to ngx_quic_write_buffer() and ngx_quic_read_buffer().
Now ngx_quic_stream_t is decoupled from ngx_connection_t in a way that it
can persist after connection is closed by application. During this period,
server is expecting stream final size from client for correct flow control.
Also, buffered output is sent to client as more flow control credit is granted.
As shown in RFC 8446, section 2.2, Figure 3, and further specified in
section 4.6.1, BoringSSL releases session tickets in Application Data
(along with Finished) early, based on a precalculated client Finished
transcript, once client signalled early data in extensions.
Initially, frames are genereated and stored in ctx->frames.
Next, ngx_quic_output() collects frames to be sent in in ctx->sending.
On failure, ngx_quic_revert_sned() returns frames into ctx->frames.
On success, the ngx_quic_commit_send() moves ack-eliciting frames into
ctx->sent and frees non-ack-eliciting frames.
This function also updates in-flight bytes counter, so only actually sent
frames are accounted.
The counter is decremented in the following cases:
- acknowledgment is received
- packet was declared lost
- we are discarding context completely
In each of this cases frame is removed from ctx->sent queue and in-flight
counter is accordingly decremented.
The patch fixes the case of discarding context - only removing frames
from ctx->sent must be followed by in-flight bytes counter decrement,
otherwise cg->in_flight could experience type underflow.
The issue appeared in b1676cd64dc9.
The cd8018bc81a5 fixed unintended send of non-padded initial packets,
but failed to restore context properly: only processed contexts need
to be restored. As a consequence, a packet number could be restored
from uninitialized value.
Previously, the flag could be reset after send_chain() with a limit, even
though there was room for more data. The application then started waiting for
a write event notification, which never happened.
Now the wev->ready flag is only reset when flow control is exhausted.
The switch happens when received byte counter reaches stream final size.
Previously, this state was skipped. The stream went from SIZE_KNOWN to
DATA_READ when all bytes were read by application.
The change prevents STOP_SENDING frames from being sent when all data is
received from client, but not yet fully read by application.
Previously, size was calculated based on the number of input bytes processed
by the function. Now only the copied bytes are considered. This prevents
overlapping buffers from contributing twice to the overall written size.
Previously, non-padded initial packet could be sent as a result of the
following situation:
- initial queue is not empty (so padding to 1200 is required)
- handshake queue is not empty (so padding is to be added after h/s packet)
- path is limited
If serializing handshake packet would violate path limit, such packet was
omitted, and the non-padded initial packet was sent.
The fix is to avoid sending the packet at all in such case. This follows the
original intention introduced in c5155a0cb12f.
- wording in log->action is adjusted to match function names.
- connection close steps are made obvious and start with "quic close" prefix:
*1 quic close initiated rc:-4
*1 quic close silent drain:0 timedout:1
*1 quic close resumed rc:-1
*1 quic close resumed rc:-1
*1 quic close resumed rc:-4
*1 quic close completed
this makes it easy to understand if particular "close" record is an initial
cause or lasting process, or the final one.
- cases of close without quic connection now logged as "packet rejected":
*14 quic run
*14 quic packet rx long flags:ec version:1
*14 quic packet rx hs len:61
*14 quic packet rx dcid len:20 00000000000002c32f60e4aa2b90a64a39dc4228
*14 quic packet rx scid len:8 81190308612cd019
*14 quic expected initial, got handshake
*14 quic packet done rc:-1 level:hs decr:0 pn:0 perr:0
*14 quic packet rejected rc:-1, cleanup connection
*14 reusable connection: 0
this makes it easy to spot early packet rejection and avoid confuse with
quic connection closing (which in fact was not even created).
- packet processing summary now uses same prefix "quic packet done rc:"
- added debug to places where packet was rejected without any reason logged
The NGX_DECLINED is replaced with NGX_DONE to match closer to return code
of ngx_quic_handle_packet() and ngx_quic_close_connection() rc argument.
The ngx_quic_close_connection() rc code is used only when quic connection
exists, thus anything goes if qc == NULL.
The ngx_quic_handle_datagram() does not return NG_OK in cases when quic
connection is not yet created.
Previously, closure detection for server-initiated uni streams was not properly
implemented. Instead, HTTP/3 code relied on QUIC code posting the read event
and setting rev->error when it needed to close the stream. Then, regular
uni stream read handler called c->recv() and received error, which closed the
stream. This was an ad-hoc solution. If, for whatever reason, the read
handler was called earlier, c->recv() would return 0, which would also close
the stream.
Now server-initiated uni streams have a separate read event handler for
tracking stream closure. The handler calls c->recv(), which normally returns
0, but may return error in case of closure.
Sending the instruction is delayed until the end of the current event cycle.
Delaying the instruction is allowed by quic-qpack-21, section 2.2.2.3.
The goal is to reduce the amount of data sent back to client by accumulating
several inserts in one instruction and sometimes not sending the instruction at
all, if Section Acknowledgement was sent just before it.
Operations like ngx_quic_open_stream(), ngx_http_quic_get_connection(),
ngx_http_v3_finalize_connection(), ngx_http_v3_shutdown_connection() used to
receive a QUIC stream connection. Now they can receive the main QUIC
connection as well. This is useful when calling them from a stream context.
This is to ease transition with oldish BoringSSL versions,
the default for SSL_set_quic_use_legacy_codepoint() has been
flipped in BoringSSL a1d3bfb64fd7ef2cb178b5b515522ffd75d7b8c5.
Previously, when input ended on a QUIC buffer boundary, input chain was not
advanced to the next buffer. As a result, ngx_quic_write_chain() returned
a chain with an empty buffer instead of NULL. This broke HTTP write filter,
preventing it from closing the HTTP request and eventually timing out.
Now input chain is always advanced to a buffer that has data, before checking
QUIC buffer boundary condition.
RFC 9000, 9.3. Responding to Connection Migration:
An endpoint only changes the address to which it sends packets in
response to the highest-numbered non-probing packet.
The patch extends this requirement to probing packets. Although it may
seem excessive, it helps with mitigation of reply attacks (when an off-path
attacker has copied packet with PATH_CHALLENGE and uses different
addresses to exhaust available connection ids).
The quic connection now holds active, backup and probe paths instead
of sockets. The number of migration paths is now limited and cannot
be inflated by a bad client or an attacker.
The client id is now associated with path rather than socket. This allows
to simplify processing of output and connection ids handling.
New migration abandons any previously started migrations. This allows to
free consumed client ids and request new for use in future migrations and
make progress in case when connection id limit is hit during migration.
A path now can be revalidated without losing its state.
The patch also fixes various issues with NAT rebinding case handling:
- paths are now validated (previously, there was no validation
and paths were left in limited state)
- attempt to reuse id on different path is now again verified
(this was broken in 40445fc7c403)
- former path is now validated in case of apparent migration
The function splits a buffer at given offset. The function is now
called from ngx_quic_read_chain() and ngx_quic_write_chain(), which
simplifies both functions.
After 9ae239d2547d, ngx_quic_handle_write_event() no longer runs into
ngx_send_lowat() for QUIC connections, so the check became excessive.
It is assumed that external modules operating with SO_SNDLOWAT
(I'm not aware of any) should do this check on their own.
Previously, ngx_quic_write_chain() treated each input buffer as a memory
buffer, which is not always the case. Special buffers were not skipped, which
is especially important when hitting the input byte limit.
The issue manifested itself with ngx_quic_write_chain() returning a non-empty
chain consisting of a special last_buf buffer when called from QUIC stream
send_chain(). In order for this to happen, input byte limit should be equal to
the chain length, and the input chain should end with an empty last_buf buffer.
An easy way to achieve this is the following:
location /empty {
return 200;
}
When this non-empty chain was returned from send_chain(), it signalled to the
caller that input was blocked, while in fact it wasn't. This prevented HTTP
request from finalization, which prevented QUIC from sending STREAM FIN to
the client. The QUIC stream was then reset after a timeout.
Now special buffers are skipped and send_chain() returns NULL in the case
above, which signals to the caller a successful operation.
Also, original byte limit is now passed to ngx_quic_write_chain() from
send_chain() instead of actual chain length to make sure it's never zero.
Previously, when a STREAM FIN frame with no data bytes was received after all
prior stream data were already read by the application layer, the frame was
ignored and eof was not reported to the application.
It was mostly copy of the ngx_quic_listen(). Now ngx_quic_listen() no
longer generates server id and increments seqnum. Instead, the server
id is generated when the socket is created.
The ngx_quic_alloc_socket() function is renamed to ngx_quic_create_socket().
Previously, buffer lists was used to track used buffers. Now reference
counter is used instead. The new implementation is simpler and faster with
many buffer clones.
They are replaced with ngx_quic_write_chain() and ngx_quic_read_chain().
These functions represent the API to data buffering.
The first function adds data of given size at given offset to the buffer.
Now it returns the unwritten part of the chain similar to c->send_chain().
The second function returns data of given size from the beginning of the buffer.
Its second argument and return value are swapped compared to
ngx_quic_split_bufs() to better match ngx_quic_write_chain().
Added, returned and stored data are regular ngx_chain_t/ngx_buf_t chains.
Missing data is marked with b->sync flag.
The functions are now used in both send and recv data chains in QUIC streams.
Previously, when a few bytes were send to a QUIC stream by the application, a
4K buffer was allocated for these bytes. Then a STREAM frame was created and
that entire buffer was used as data for that frame. The frame with the buffer
were in use up until the frame was acked by client. Meanwhile, when more
bytes were send to the stream, more buffers were allocated and assigned as
data to newer STREAM frames. In this scenario most buffer memory is unused.
Now the unused part of the stream output buffer is available for further
stream output while earlier parts of the buffer are waiting to be acked.
This is achieved by splitting the output buffer.
The output is always sent to the active path, which is stored in the
quic connection. There is no need to pass it in arguments.
When output has to be send to to a specific path (in rare cases, such as
path probing), a separate method exists (ngx_quic_frame_sendto()).
The path validation status and anti-amplification limit status is actually
two different variables. It is possible that validating path should not
be limited (for example, when re-validating former path).
Previously, path was considered valid during arbitrary selected 10m timeout
since validation. This is quite not what RFC 9000 says; the relevant
part is:
An endpoint MAY skip validation of a peer address if that
address has been seen recently.
The patch considers a path to be 'recently seen' if packets were received
during idle timeout. If a packet is received from the path that was seen
not so recently, such path is considered new, and anti-amplification
restrictions apply.
After creation, a client stream is added to qc->streams.uninitialized queue.
After initialization it's removed from the queue. If a stream is never
initialized, it is freed in ngx_quic_close_streams(). Stream initializer
is now set as read event handler in stream connection.
Previously qc->streams.uninitialized was used only for delayed stream
initialization.
The change makes it possible not to handle separately the case of a new stream
in stream-related frame handlers. It makes these handlers simpler since new
streams and existing streams are now handled by the same code.
The SSL_OP_ENABLE_MIDDLEBOX_COMPAT option is provided by QuicTLS and enabled
by default in the newly created SSL contexts. SSL_set_quic_method() is used
to clear it, which is required for SSL handshake to work on QUIC connections.
Switching context in the ngx_http_ssl_servername() SNI callback overrides SSL
options from the new SSL context. This results in the option set again.
Fix is to explicitly clear it when switching to another SSL context.
Initially reported here (in Russian):
http://mailman.nginx.org/pipermail/nginx-ru/2021-November/063989.html
ngx_http_v3_tables.h and ngx_http_v3_tables.c are renamed to
ngx_http_v3_table.h and ngx_http_v3_table.c to better match HTTP/2 code.
ngx_http_v3_streams.h and ngx_http_v3_streams.c are renamed to
ngx_http_v3_uni.h and ngx_http_v3_uni.c to better match their content.
Directives that set transport parameters are removed from the configuration.
Corresponding values are derived from the quic configuration or initialized
to default. Whenever possible, quic configuration parameters are taken from
higher-level protocol settings, i.e. HTTP/3.
A new variable $http3 is added. The variable equals to "h3" for HTTP/3
connections, "hq" for hq connections and is an empty string otherwise.
The variable $quic is eliminated.
The new variable is similar to $http2 variable.
RFC 9000 19.16
The sequence number specified in a RETIRE_CONNECTION_ID frame MUST NOT
refer to the Destination Connection ID field of the packet in which the
frame is contained.
Before the patch, the RETIRE_CONNECTION_ID frame was sent before switching
to the new client id. If retired client id was currently in use, this lead
to violation of the spec.
The c->udp->dgram may be NULL only if the quic connection was just
created: the ngx_event_udp_recvmsg() passes information about datagrams
to existing connections by providing information in c->udp.
If case of a new connection, c->udp is allocated by the QUIC code during
creation of quic connection (it uses c->sockaddr to initialize qsock->path).
Thus the check for qsock->path is excessive and can be read wrong, assuming
that other options possible, leading to warnings from clang static analyzer.
Removed sending CLOSE_CONNECTION directly to avoid duplicate frames,
since it is sent later again in SSL_do_handshake() error handling.
As such, removed redundant settings of error fields set elsewhere.
While here, improved debug message.
All open sockets are stored in a queue. There is no need to close some
of them separately. If it happens that active and backup point to same
socket, double close may happen (leading to possible segfault).
The RFC 9000 allows a packet from known CID arrive from unknown path:
These requirements regarding connection ID reuse apply only to the
sending of packets, as unintentional changes in path without a change
in connection ID are possible. For example, after a period of
network inactivity, NAT rebinding might cause packets to be sent on a
new path when the client resumes sending.
Before the patch, such packets were rejected with an error in the
ngx_quic_check_migration() function. Removing the check makes the
separate function excessive - remaining checks are early migration
check and "disable_active_migration" check. The latter is a transport
parameter sent to client and it should not be used by server.
The server should send "disable_active_migration" "if the endpoint does
not support active connection migration" (18.2). The support status depends
on nginx configuration: to have migration working with multiple workers,
you need bpf helper, available on recent Linux systems. The patch does
not set "disable_active_migration" automatically and leaves it for the
administrator. By default, active migration is enabled.
RFC 900 says that it is ok to migrate if the peer violates
"disable_active_migration" flag requirements:
If the peer violates this requirement,
the endpoint MUST either drop the incoming packets on that path without
generating a Stateless Reset
OR
proceed with path validation and allow the peer to migrate. Generating a
Stateless Reset or closing the connection would allow third parties in the
network to cause connections to close by spoofing or otherwise manipulating
observed traffic.
So, nginx adheres to the second option and proceeds to path validation.
Note:
The ngtcp2 may be used for testing both active migration and NAT rebinding:
ngtcp2/client --change-local-addr=200ms --delay-stream=500ms <ip> <port> <url>
ngtcp2/client --change-local-addr=200ms --delay-stream=500ms --nat-rebinding \
<ip> <port> <url>
Single UDP datagram may contain multiple QUIC datagrams. In order to
facilitate handling of such cases, 'first' flag in the ngx_quic_header_t
structure is introduced.
Previously, the retired socket was not closed if it didn't match
active or backup.
New sockets could not be created (due to count limit), since retired socket
was not closed before calling ngx_quic_create_sockets().
When replacing retired socket, new socket is only requested after closing
old one, to avoid hitting the limit on the number of active connection ids.
Together with added restrictions, this fixes an issue when a current socket
could be closed during migration, recreated and erroneously reused leading
to null pointer dereference.
Previously the frame was not handled and connection was closed with an error.
Now, after receiving this frame, global flow control is updated and new
flow control credit is sent to client.
Previously, after receiving STREAM_DATA_BLOCKED, current flow control limit
was sent to client. Now, if the limit can be updated to the full window size,
it is updated and the new value is sent to client, otherwise nothing is sent.
The change lets client update flow control credit on demand. Also, it saves
traffic by not sending MAX_STREAM_DATA with the same value twice.
The reasons why a stream may not be created by server currently include hitting
worker_connections limit and memory allocation error. Previously in these
cases the entire QUIC connection was closed and all its streams were shut down.
Now the new stream is rejected and existing streams continue working.
To reject an HTTP/3 request stream, RESET_STREAM and STOP_SENDING with
H3_REQUEST_REJECTED error code are sent to client. HTTP/3 uni streams and
Stream streams are not rejected.
As per quic-qpack-21:
When a stream is reset or reading is abandoned, the decoder emits a
Stream Cancellation instruction.
Previously the instruction was not sent. Now it's sent when closing QUIC
stream connection if dynamic table capacity is non-zero and eof was not
received from client. The latter condition means that a trailers section
may still be on its way from client and the stream needs to be cancelled.
A QUIC stream connection is treated as reusable until first bytes of request
arrive, which is also when the request object is now allocated. A connection
closed as a result of draining, is reset with the error code
H3_REQUEST_REJECTED. Such behavior is allowed by quic-http-34:
Once a request stream has been opened, the request MAY be cancelled
by either endpoint. Clients cancel requests if the response is no
longer of interest; servers cancel requests if they are unable to or
choose not to respond.
When the server cancels a request without performing any application
processing, the request is considered "rejected." The server SHOULD
abort its response stream with the error code H3_REQUEST_REJECTED.
The client can treat requests rejected by the server as though they had
never been sent at all, thereby allowing them to be retried later.
When an HTTP/3 function returns an error in context of a QUIC stream, it's
this function's responsibility now to finalize the entire QUIC connection
with the right code, if required. Previously, QUIC connection finalization
could be done both outside and inside such functions. The new rule follows
a similar rule for logging, leads to cleaner code, and allows to provide more
details about the error.
While here, a few error cases are no longer treated as fatal and QUIC connection
is no longer finalized in these cases. A few other cases now lead to
stream reset instead of connection finalization.
The PATH_RESPONSE frame must be expanded to 1200, except the case
when anti-amplification limit is in effect, i.e. on unvalidated paths.
Previously, the anti-amplification limit was always applied.
If client ID was never used, its refcount is zero. To keep things simple,
the ngx_quic_unref_client_id() function is now aware of such IDs.
If client ID was used, the ngx_quic_replace_retired_client_id() function
is supposed to find all users and unref the ID, thus ngx_quic_unref_client_id()
should not be called after it.
Previously, it was not enforced in the stream module.
Now, since b9e02e9b2f1d it is possible to specify protocols.
Since ALPN is always required, the 'require_alpn' setting is now obsolete.
The "min" and "max" arguments refer to UDP datagram size. Generating payload
requires to account properly for header size, which is variable and depends on
payload size and packet number.
After fe919fd63b0b, processing QUIC streams was postponed until after handshake
completion, which means that 0-RTT is effectively off. With ssl_ocsp enabled,
it could be further delayed. This differs from how OCSP validation works with
SSL_read_early_data(). With this change, processing QUIC streams is unlocked
when obtaining 0-RTT secret.
The sent queue is sorted by packet number. It is possible to avoid
traversing full queue while handling ack ranges. It makes sense to
start traversing from the queue head (i.e. check oldest packets first).
With this patch, all traffic over a QUIC connection is compared to traffic
over QUIC streams. As long as total traffic is many times larger than stream
traffic, we consider this to be a flood.
With this patch, all traffic over HTTP/3 bidi and uni streams is counted in
the h3c->total_bytes field, and payload traffic is counted in the
h3c->payload_bytes field. As long as total traffic is many times larger than
payload traffic, we consider this to be a flood.
Request header traffic is counted as if all fields are literal. Response
header traffic is counted as is.
Checking the reset after encryption avoids false positives. More importantly,
it avoids the check entirely in the usual case where decryption succeeds.
RFC 9000, 10.3.1 Detecting a Stateless Reset
Endpoints MAY skip this check if any packet from a datagram is
successfully processed.
As per RFC 9000:
An endpoint that receives a STOP_SENDING frame MUST send a RESET_STREAM
frame if the stream is in the "Ready" or "Send" state.
An endpoint SHOULD copy the error code from the STOP_SENDING frame to
the RESET_STREAM frame it sends, but it can use any application error code.
The flag indicates that the entire response was sent to the socket up to the
last_buf flag. The flag is only usable for protocol implementations that call
ngx_http_write_filter() from header filter, such as HTTP/1.x and HTTP/3.
Similar to the previous change, a segmentation fault occurres when evaluating
SSL certificates on a QUIC connection due to an uninitialized stream session.
The fix is to adjust initializing the QUIC part of a connection until after
it has session and variables initialized.
Similarly, this appends logging error context for QUIC connections:
- client 127.0.0.1:54749 connected to 127.0.0.1:8880 while handling frames
- quic client timed out (60: Operation timed out) while handling quic input
A QUIC connection doesn't have c->log->data and friends initialized to sensible
values. Yet, a request can be created in the certificate callback with such an
assumption, which leads to a segmentation fault due to null pointer dereference
in ngx_http_free_request(). The fix is to adjust initializing the QUIC part of
a connection such that it has all of that in place.
Further, this appends logging error context for unsuccessful QUIC handshakes:
- cannot load certificate .. while handling frames
- SSL_do_handshake() failed .. while sending frames
OpenSSL library QUIC support cannot be tested at configure time when
using the --with-openssl option so assume it's present if requested.
While here, fixed the error message in case QUIC support is missing.
Previously the counter was not incremented for HTTP/3 streams, but still
decremented in ngx_http_close_connection(). There are two solutions here, one
is to increment the counter for HTTP/3 streams, and the other one is not to
decrement the counter for HTTP/3 streams. The latter solution looks
inconsistent with ngx_stat_reading/ngx_stat_writing, which are incremented on a
per-request basis. The change adds ngx_stat_active increment for HTTP/3
request and push streams.
Notably, it is to avoid setting the TCP_NODELAY flag for QUIC streams
in ngx_http_upstream_send_response(). It is an invalid operation on
inherently SOCK_DGRAM sockets, which leads to QUIC connection close.
The change reduces diff to the default branch in stream content phase.
This function was only referenced from ngx_http_v3_create_push_request() to
initialize push connection log. Now the log handler is copied from the parent
request connection.
The change reduces diff to the default branch.
The functions ngx_quic_handle_read_event() and ngx_quic_handle_write_event()
are added. Previously this code was a part of ngx_handle_read_event() and
ngx_handle_write_event().
The change simplifies ngx_handle_read_event() and ngx_handle_write_event()
by moving QUIC-related code to a QUIC source file.
Previously it had -1 as fd. This fixes proxying, which relies on downstream
connection having a real fd. Also, this reduces diff to the default branch for
ngx_close_connection().
If a MAX_DATA frame was received before any stream was created, then the worker
process would crash in nginx_quic_handle_max_data_frame() while traversing the
stream tree. The issue is solved by adding a check that makes sure the tree is
not empty.
Previously, when cleaning up a QUIC stream in shutdown mode,
ngx_quic_shutdown_quic() was called, which could close the QUIC connection
right away. This could be a problem if the connection was referenced up the
stack. For example, this could happen in ngx_quic_init_streams(),
ngx_quic_close_streams(), ngx_quic_create_client_stream() etc.
With a typical HTTP/3 client the issue is unlikely because of HTTP/3 uni
streams which need a posted event to close. In this case QUIC connection
cannot be closed right away.
Now QUIC connection read event is posted and it will shut down the connection
asynchronously.
After receiving GOAWAY, client is not supposed to create new streams. However,
until client reads this frame, we allow it to create new streams, which are
gracefully rejected. To prevent client from abusing this algorithm, a new
limit is introduced. Upon reaching keepalive_requests * 2, server now closes
the entire QUIC connection claiming excessive load.
The "hq" mode is HTTP/0.9-1.1 over QUIC. The following limits are introduced:
- uni streams are not allowed
- keepalive_requests is enforced
- keepalive_time is enforced
In case of error, QUIC connection is finalized with 0x101 code. This code
corresponds to HTTP/3 General Protocol Error.
Previously, in-flight byte counter and congestion window were properly
maintained, but the limit was not properly implemented.
Now a new datagram is sent only if in-flight byte counter is less than window.
The limit is datagram-based, which means that a single datagram may lead to
exceeding the limit, but the next one will not be sent.
Previously, the error was ignored leading to unnecessary retransmits.
Now, unsent frames are returned into output queue, state is reset, and
timer is started for the next send attempt.
As per quic-http-34:
Endpoints SHOULD create the HTTP control stream as well as the
unidirectional streams required by mandatory extensions (such as the
QPACK encoder and decoder streams) first, and then create additional
streams as allowed by their peer.
Previously, client could create and destroy additional uni streams unlimited
number of times before creating mandatory streams.
OpenSSL is known to provide read keys for an encryption level before the
level is active in TLS, following the old BoringSSL API. In BoringSSL,
it was then fixed to defer releasing read keys until QUIC may use them.
The directive enables usage of UDP segmentation offloading by quic.
By default, gso is disabled since it is not always operational when
detected (depends on interface configuration).
To improve output performance, UDP segmentation offloading is used
if available. If there is a significant amount of data in an output
queue and path is verified, QUIC packets are not sent one-by-one,
but instead are collected in a buffer, which is then passed to kernel
in a single sendmsg call, using UDP GSO. Such method greatly decreases
number of system calls and thus system load.
Additionally, the ngx_init_srcaddr_cmsg() function is introduced which
initializes control message with connection local address.
The NGX_HAVE_ADDRINFO_CMSG macro is defined when at least one of methods
to deal with corresponding control message is available.
Sometimes, QUIC packets need to be of certain (or minimal) size. This is
achieved by adding PADDING frames. It is possible, that adding padding will
affect header size, thus forcing us to recalculate padding size once more.
Renamed header -> field per quic-qpack naming convention, in particular:
- Header Field -> Field Line
- Header Block -> (Encoded) Field Section
- Without Name Reference -> With Literal Name
- Header Acknowledgement -> Section Acknowledgment
As per quic-http-34, these are the cases when this error should be generated:
If an endpoint receives a second SETTINGS frame
on the control stream, the endpoint MUST respond with a connection
error of type H3_FRAME_UNEXPECTED
SETTINGS frames MUST NOT be sent on any stream other than the control
stream. If an endpoint receives a SETTINGS frame on a different
stream, the endpoint MUST respond with a connection error of type
H3_FRAME_UNEXPECTED.
A client MUST NOT send a PUSH_PROMISE frame. A server MUST treat the
receipt of a PUSH_PROMISE frame as a connection error of type
H3_FRAME_UNEXPECTED; see Section 8.
The MAX_PUSH_ID frame is always sent on the control stream. Receipt
of a MAX_PUSH_ID frame on any other stream MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED.
Receipt of an invalid sequence of frames MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED; see Section 8. In
particular, a DATA frame before any HEADERS frame, or a HEADERS or
DATA frame after the trailing HEADERS frame, is considered invalid.
A CANCEL_PUSH frame is sent on the control stream. Receiving a
CANCEL_PUSH frame on a stream other than the control stream MUST be
treated as a connection error of type H3_FRAME_UNEXPECTED.
The GOAWAY frame is always sent on the control stream.
The quic-http-34 is ambiguous as to what error should be generated for the
first frame in control stream:
Each side MUST initiate a single control stream at the beginning of
the connection and send its SETTINGS frame as the first frame on this
stream. If the first frame of the control stream is any other frame
type, this MUST be treated as a connection error of type
H3_MISSING_SETTINGS.
If a DATA frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED.
If a HEADERS frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED.
Previously, H3_FRAME_UNEXPECTED had priority, but now H3_MISSING_SETTINGS has.
The arguments in the spec sound more compelling for H3_MISSING_SETTINGS.
- Function ngx_quic_control_flow() is introduced. This functions does
both MAX_DATA and MAX_STREAM_DATA flow controls. The function is called
from STREAM and RESET_STREAM frame handlers. Previously, flow control
was only accounted for STREAM. Also, MAX_DATA flow control was not accounted
at all.
- Function ngx_quic_update_flow() is introduced. This function advances flow
control windows and sends MAX_DATA/MAX_STREAM_DATA. The function is called
from RESET_STREAM frame handler, stream cleanup handler and stream recv()
handler.
When starting processing a new encoder instruction, the header state is not
memzero'ed because generally it's burdensome. If the header value is empty,
this resulted in inserting a stale value left from the previous instruction.
Based on a patch by Zhiyong Sun.
As per quic-transport 34, FINAL_SIZE_ERROR is generated if an endpoint received
a STREAM frame or a RESET_STREAM frame containing a final size that was lower
than the size of stream data that was already received.
Generic function ngx_quic_order_bufs() is introduced. This function creates
and maintains a chain of buffers with holes. Holes are marked with b->sync
flag. Several buffers and holes in this chain may share the same underlying
memory buffer.
When processing STREAM frames with this function, frame data is copied only
once to the right place in the stream input chain. Previously data could
be copied twice. First when buffering an out-of-order frame data, and then
when filling stream buffer from ordered frame queue. Now there's only one
data chain for both tasks.
According to profiling, those two are among most frequently called,
so inlining is generally useful, and unrolling should help with it.
Further, this fixes undefined behaviour seen with invalid values.
Inspired by Yu Liu.
When using server push, a segfault occured because
ngx_http_v3_create_push_request() accessed ngx_http_v3_session_t object the old
way. Prior to 9ec3e71f8a61, HTTP/3 session was stored directly in c->data.
Now it's referenced by the v3_session field of ngx_http_connection_t.
Client IDs cannot be reused on different paths. This change allows to reuse
client id previosly seen on the same path (but with different dcid) in case
when no unused client IDs are available.
According to quic-transport, 9.1:
PATH_CHALLENGE, PATH_RESPONSE, NEW_CONNECTION_ID, and PADDING frames
are "probing frames", and all other frames are "non-probing frames".
Previously it was parsed in ngx_http_v3_streams.c, while the streams were
parsed in ngx_http_v3_parse.c. Now all parsing is done in one file. This
simplifies parsing API and cleans up ngx_http_v3_streams.c.
Previously, an ngx_http_v3_connection_t object was created for HTTP/3 and
then assinged to c->data instead of the generic ngx_http_connection_t object.
Now a direct reference is added to ngx_http_connection_t, which is less
confusing and does not require a flag for http3.
It's used instead of accessing c->quic->parent->data directly. Apart from being
simpler, it allows to change the way session is stored in the future by changing
the macro.
Now ngx_http_v3_ack_header() and ngx_http_v3_inc_insert_count() always generate
decoder error. Our implementation does not use dynamic tables and does not
expect client to send Section Acknowledgement or Insert Count Increment.
Stream Cancellation, on the other hand, is allowed to be sent anyway. This is
why ngx_http_v3_cancel_stream() does not return an error.
The patch adds proper transitions between multiple networking addresses that
can be used by a single quic connection. New networking paths are validated
using PATH_CHALLENGE/PATH_RESPONSE frames.
7.2.1:
If a DATA frame is received on a control stream, the recipient MUST
respond with a connection error of type H3_FRAME_UNEXPECTED;
7.2.2:
If a HEADERS frame is received on a control stream, the recipient MUST
respond with a connection error (Section 8) of type H3_FRAME_UNEXPECTED.
Currently both names are used which is confusing. Historically these were
different objects, but now it's the same one. The name qs (quic stream) makes
more sense than sn (stream node).
PATH_RESPONSE was explicitly forbidden in 0-RTT since at least draft-22, but
the Frame Types table was not updated until recently while in IESG evaluation.
Stop including QUIC headers with no user-serviceable parts inside.
This allows to provide a much cleaner QUIC interface. To cope with that,
ngx_quic_derive_key() is now explicitly exported for v3 and quic modules.
Additionally, this completely hides the ngx_quic_keys_t internal type.
The "ngx_event_quic.h" header file now contains only public definitions,
used by modules. All internal definitions are moved into
the "ngx_event_quic_connection.h" header file.
The function correctly cleans up resources in case of failure to create
initial server id: it removes previously created udp node for odcid from
listening rbtree.
Currently listener contains rbtree with multiple nodes for single QUIC
connection: each corresponding to specific server id. Each udp node points
to same ngx_connection_t, which points to QUIC connection via c->udp field.
Thus when an event handler is called, it only gets ngx_connection_t with
c->udp pointing to QUIC connection. This makes it hard to obtain actual
node which was used to dispatch packet (it requires to repeat DCID lookup).
Additionally, ngx_quic_connection_t->udp field is only needed to keep a
pointer in c->udp. The node is not added into the tree and does not carry
useful information.
Sometimes it is required to process datagram properties at higher level (i.e.
QUIC is interested in source address which may change and IP options). The
patch adds ngx_udp_dgram_t structure used to pass packet-related information
in c->udp.
Previously, when a new datagram arrived, data were copied from the UDP layer
to the QUIC layer via c->recv() interface. Now UDP buffer is accessed
directly.
OpenSSL 3.0 started to require HKDF-Extract output PRK length pointer
used to represent the amount of data written to contain the length of
the key buffer before the call. EVP_PKEY_derive() documents this.
See HKDF_Extract() internal implementation update in this change:
https://github.com/openssl/openssl/commit/5a285ad
The function ngx_quic_shutdown_connection() waits until all non-cancelable
streams are closed, and then closes the connection. In HTTP/3 cancelable
streams are all unidirectional streams except push streams.
The function is called from HTTP/3 when client reaches keepalive_requests.
In case of long header packets, dcid length was not read correctly.
While there, macros to parse uint64 was fixed as well as format specifiers
to print it in debug mode.
Thanks to Gao Yan <gaoyan09@baidu.com>.
As per quic-transport-34:
An endpoint also restarts its idle timer when sending an ack-eliciting
packet if no other ack-eliciting packets have been sent since last receiving
and processing a packet.
Previously, the timer was set for any packet.
The structure is used to parse an HTTP/3 request. An object of this type is
added to ngx_http_request_t instead of h3_parse generic pointer.
Also, the new field is located outside of the request ephemeral zone to keep it
safe after request headers are parsed.
The "max_ack_delay", "ack_delay_exponent", and "max_udp_payload_size"
transport parameters were not communicated to client.
The "disable_active_migration" and "active_connection_id_limit"
parameters were not saved into zero-rtt context.
18.1. Reserved Transport Parameters
Transport parameters with an identifier of the form "31 * N + 27" for
integer values of N are reserved to exercise the requirement that
unknown transport parameters be ignored. These transport parameters
have no semantics, and can carry arbitrary values.
Setting the timer is brought into compliance with quic-recovery-34. Now it's
set from a single function ngx_quic_set_lost_timer() that takes into account
both loss detection and PTO. The following issues are fixed with this change:
- when in loss detection mode, discarding a context could turn off the
timer forever after switching to the PTO mode
- when in loss detection mode, sending a packet resulted in rescheduling the
timer as if it's always in the PTO mode
As per quic-transport-33:
An endpoint MUST acknowledge all ack-eliciting Initial and Handshake
packets immediately
If a packet carrying Initial or Handshake ACK was lost, a non-immediate ACK
should not be sent later. Instead, client is expected to send a new packet
to acknowledge.
Sending non-immediate ACKs for Initial packets can cause the client to
generate an inflated RTT sample.
The token generation in QUIC is reworked. Single host key is used to generate
all required keys of needed sizes using HKDF.
The "quic_stateless_reset_token_key" directive is removed. Instead, the
"quic_host_key" directive is used, which reads key from file, or sets it
to random bytes if not specified.
The flag was introduced to create type-aware CONNECTION_CLOSE frames,
and now is replaced with frame type information, directly accessible.
Notably, this fixes type logging for received frames in b3d9e57d0f62.
The flag is used in ngx_http_finalize_connection() to switch client connection
to the keepalive mode. Since eaea7dac3292 this code is not executed for HTTP/3
which allows us to revert the change and get back to the default branch code.
The change reduces diff to the default branch for
src/http/ngx_http_request_body.c.
Also, client Content-Length, if present, is now checked against the real body
size sent by client.
- split ngx_quic_process_packet() in two functions with the second one called
ngx_quic_process_payload() in charge of decrypring and handling the payload
- renamed ngx_quic_payload_handler() to ngx_quic_handle_frames()
- moved error cleanup from ngx_quic_input() to ngx_quic_process_payload()
- moved handling closed connection from ngx_quic_handle_frames() to
ngx_quic_process_payload()
- minor fixes
Previously, quic connection object was created when Retry packet was sent.
This is neither necessary nor convenient, and contradicts the idea of retry:
protecting from bad clients and saving server resources.
Now, the connection is not created, token is verified cryptographically
instead of holding it in connection.
This function should be called at the end of an event handler to prepare the
event for the next handler call. Particularly, the "active" flag is set or
cleared depending on data availability.
With this call missing in one code path, read handler was not called again
after handling the initial part of the client request, if the request was too
big to fit into a single STREAM frame.
Now ngx_handle_read_event() is called in this code path. Also, read timer is
restarted.
A header with the name containing null, CR, LF, colon or uppercase characters,
is now considered an error. A header with the value containing null, CR or LF,
is also considered an error.
Also, header is considered invalid unless its name only contains lowercase
characters, digits, minus and optionally underscore. Such header can be
optionally ignored.
- :method, :path and :scheme are expected exactly once and not empty
- :method and :scheme character validation is added
- :authority cannot appear more than once
Notably, the version negotiation table is updated to reject draft-33/QUICv1
(which requires a new TLS codepoint) unless explicitly asked to built with.
The quic kernel bpf helper inspects packet payload for DCID, extracts key
and routes the packet into socket matching the key.
Due to reuseport feature, each worker owns a personal socket, which is
identified by the same key, used to create DCID.
BPF objects are locked in RAM and are subject to RLIMIT_MEMLOCK.
The "ulimit -l" command may be used to setup proper limits, if maps
cannot be created with EPERM or updated with ETOOLONG.
Previously, when processing client ACK, rtt could be calculated for a packet
different than the largest if it was missing in the sent chain. Even though
this is an unlikely situation, rtt based on a different packet could be larger
than needed leading to bigger pto timeout and performance degradation.
Previously, this only worked for Application level because before
quic-transport-30, there were the following constraints:
Because the receiver doesn't use the ACK Delay for Initial and Handshake
packets, a sender SHOULD send a value of 0.
When adjusting an RTT sample using peer-reported acknowledgement delays, an
endpoint ... MUST ignore the ACK Delay field of the ACK frame for packets
sent in the Initial and Handshake packet number space.
Now initial output packet is not padded anymore if followed by a handshake
packet. If the datagram is still not big enough to satisfy minimum size
requirements, handshake packet is padded.
The patch replaces c->send() occurences with c->send_chain(), because the
latter accounts for the local address, which may be different if the wildcard
listener is used.
Previously, server sent response to client using address different from
one client connected to.
Notably, this fixes an issue with Chrome that can emit a "certificate_unknown"
alert during the SSL handshake where c->ssl->no_wait_shutdown is not yet set.
The filter is responsible for creating HTTP/3 response header and body.
The change removes differences to the default branch for
ngx_http_chunked_filter_module and ngx_http_header_filter_module.
Instead, appropriate format specifier for hexadecimal is used
in ngx_log_debug().
The STREAM frame "data" debug is moved into ngx_quic_log_frame(), similar
to all other frame fields debug.
Header value returned from the HTTP parser is expected to be null-terminated or
have a spare byte after the value bytes. When an empty header value was passed
by client in a literal header representation, neither was true. This could
result in segfault. The fix is to assign a literal empty null-terminated
string in this case.
Thanks to Andrey Kolyshkin.
The largely duplicate type-specific functions ngx_quic_parse_initial_header(),
ngx_quic_parse_handshake_header(), and a missing one for 0-RTT, were merged.
The new order of functions listed in ngx_event_quic_transport.c reflects this.
|_ ngx_quic_parse_long_header - version-invariant long header fields
\_ ngx_quic_supported_version - a helper to decide we can go further
\_ ngx_quic_parse_long_header_v1 - QUICv1-specific long header fields
0-RTT packets previously appeared as Handshake are now logged as appropriate:
*1 quic packet rx long flags:db version:ff00001d
*1 quic packet rx early len:870
Logging SCID/DCID is no longer duplicated as were seen with Initial packets.
If trailers were missing and a chain carrying the last_buf flag had no data
in it, then last HTTP/1 chunk was broken. The problem was introduced while
implementing HTTP/3 response body generation.
The change fixes the issue and reduces diff to the mainline nginx.
Previously, if quic_stateless_reset_token_key was empty or unspecified,
initial stateless reset token was not generated. However subsequent tokens
were generated with empty key, which resulted in error with certain SSL
libraries, for example OpenSSL.
Now a random 32-byte stateless reset token key is generated if none is
specified in the configuration. As a result, stateless reset tokens are now
generated for all server ids.
Previously new dcid was generated in the same memory that was allocated for
qc->dcid when creating the QUIC connection. However this memory was also
referenced by initial_source_connection_id and retry_source_connection_id
transport parameters. As a result these parameters changed their values after
retry which broke the protocol.
A negotiated version is decoupled from NGX_QUIC_VERSION and, if supported,
now stored in c->quic->version after packets processing. It is then used
to create long header packets. Otherwise, the list of supported versions
(which may be many now) is sent in the Version Negotiation packet.
All packets in the connection are expected to have the same version.
Incoming packets with mismatched version are now rejected.
A number of unsigned variables has a special value, usually -1 or some maximum,
which produces huge numeric value in logs and makes them hard to read.
In order to distinguish such values in log, they are casted to the signed type
and printed as literal '-1'.
The client address validation didn't complete with a valid token,
which was broken after packet processing refactoring in d0d3fc0697a0.
An invalid or expired token was treated as a connection error.
Now we proceed as outlined in draft-ietf-quic-transport-32,
section 8.1.3 "Address Validation for Future Connections" below,
which is unlike validating the client address using Retry packets.
When a server receives an Initial packet with an address validation
token, it MUST attempt to validate the token, unless it has already
completed address validation. If the token is invalid then the
server SHOULD proceed as if the client did not have a validated
address, including potentially sending a Retry.
The connection is now closed in this case on internal errors only.
All key handling functionality is moved into ngx_quic_protection.c.
Public structures from ngx_quic_protection.h are now private and new
methods are available to manipulate keys.
A negotiated cipher is cached in QUIC connection from the set secret callback
to avoid calling SSL_get_current_cipher() on each encrypt/decrypt operation.
This also reduces the number of unwanted c->ssl->connection occurrences.
Previously, tx ACK frames held ranges in an array of ngx_quic_ack_range_t,
while rx ACK frames held ranges in the serialized format. Now serialized format
is used for both types of frames.
The patch resets ctx->frames queue, which may contain frames. It was possible
that congestion or amplification limits prevented all frames to be sent.
Retransmitted frames could be accounted twice as inflight: first time in
ngx_quic_congestion_lost() called from ngx_quic_resend_frames(), and later
from ngx_quic_discard_ctx().
This accounts for the following change:
* Require expansion of datagrams to ensure that a path supports at
least 1200 bytes:
- During the handshake ack-eliciting Initial packets from the
server need to be expanded
All values are prefixed with name and separated from it using colon.
Multiple values are listed without commas in between.
Rationale: this greatly simplifies log parsing for analysis.
For application level packets, only every second packet is now acknowledged,
respecting max ack delay.
13.2.1 Sending ACK Frames
In order to assist loss detection at the sender, an endpoint SHOULD
generate and send an ACK frame without delay when it receives an ack-
eliciting packet either:
* when the received packet has a packet number less than another
ack-eliciting packet that has been received, or
* when the packet has a packet number larger than the highest-
numbered ack-eliciting packet that has been received and there are
missing packets between that packet and this packet.
13.2.2. Acknowledgement Frequency
A receiver SHOULD send an ACK frame after receiving at least two
ack-eliciting packets.
13.2.4. Limiting Ranges by Tracking ACK Frames
When a packet containing an ACK frame is sent, the largest
acknowledged in that frame may be saved. When a packet containing an
ACK frame is acknowledged, the receiver can stop acknowledging
packets less than or equal to the largest acknowledged in the sent
ACK frame.
The history of acknowledged packet is kept in send context as ranges.
Up to NGX_QUIC_MAX_RANGES ranges is stored.
As a result, instead of separate ack frames, single frame with ranges
is sent.
Per draft-ietf-quic-transport-32 on the topic:
: Similarly, a server MUST expand the payload of all UDP datagrams carrying
: ack-eliciting Initial packets to at least the smallest allowed maximum
: datagram size of 1200 bytes.
The history of acknowledged packet is kept in send context as ranges.
Up to NGX_QUIC_MAX_RANGES ranges is stored.
As a result, instead of separate ack frames, single frame with ranges
is sent.
Previously, this field was not set while creating a QUIC stream connection.
As a result, calling ngx_connection_local_sockaddr() led to getsockname()
bad descriptor error.
If client acknowledged an Initial packet with CRYPTO frame and then
sent another Initial packet containing duplicate CRYPTO again, this
could result in resending frames off the empty send queue.
The new "quic_stateless_reset_token_key" directive is added. It sets the
endpoint key used to generate stateless reset tokens and enables feature.
If the endpoint receives short-header packet that can't be matched to
existing connection, a stateless reset packet is generated with
a proper token.
If a valid stateless reset token is found in the incoming packet,
the connection is closed.
Example configuration:
http {
quic_stateless_reset_token_key "foo";
...
}
The flag is tied to the initial secret creation. The presence of c->quic
pointer is sufficient to enable execution of ngx_quic_close_quic().
The ngx_quic_new_connection() function now returns the allocated quic
connection object and the c->quic pointer is set by the caller.
If an early error occurs before secrets initialization (i.e. in cases
of invalid retry token or nginx exiting), it is still possible to
generate an error response by trying to initialize secrets directly
in the ngx_quic_send_cc() function.
Before the change such early errors failed to send proper connection close
message and logged an error.
An auxilliary ngx_quic_init_secrets() function is introduced to avoid
verbose call to ngx_quic_set_initial_secret() requiring local variable.
All packet header parsing is now performed by ngx_quic_parse_packet()
function, located in the ngx_quic_transport.c file.
The packet processing is centralized in the ngx_quic_process_packet()
function which decides if the packet should be accepted, ignored or
connection should be closed, depending on the connection state.
As a result of refactoring, behavior has changed in some places:
- minimal size of Initial packet is now always tested
- connection IDs are always tested in existing connections
- old keys are discarded on encryption level switch
Now flags are processed in ngx_quic_input(), and raw->pos points to the first
byte after the flags. Redundant checks from ngx_quic_parse_short_header() and
ngx_quic_parse_long_header() are removed.
Previously, when a packet was declared lost, another packet was sent with the
same frames. Now lost frames are moved to the output frame queue and push
event is posted. This has the advantage of forming packets with more frames
than before.
Also, the start argument is removed from the ngx_quic_resend_frames()
function as excess information.
Previously the default server configuration context was used until the
:authority or host header was parsed. This led to using the configuration
parameters like client_header_buffer_size or request_pool_size from the default
server rather than from the server selected by SNI.
Also, the switch to the right server log is implemented. This issue manifested
itself as QUIC stream being logged to the default server log until :authority
or host is parsed.
Initially, client certificate verification didn't work due to the missing
hc->ssl on a QUIC stream, which is started to be set in 7738:7f0981be07c4.
Then it was lost in 7999:0d2b2664b41c introducing "quic" listen parameter.
This change re-adds hc->ssl back for all QUIC connections, similar to SSL.
As per HTTP/3 draft 30, section 7.2.8:
Frame types that were used in HTTP/2 where there is no corresponding
HTTP/3 frame have also been reserved (Section 11.2.1). These frame
types MUST NOT be sent, and their receipt MUST be treated as a
connection error of type H3_FRAME_UNEXPECTED.
As per HTTP/3 draft 29, section 4.1:
Frames of unknown types (Section 9), including reserved frames
(Section 7.2.8) MAY be sent on a request or push stream before,
after, or interleaved with other frames described in this section.
Also, trailers frame is now used as an indication of the request body end.
While for HTTP/1 unexpected eof always means an error, for HTTP/3 an eof right
after a DATA frame end means the end of the request body. For this reason,
since adding HTTP/3 support, eof no longer produced an error right after recv()
but was passed to filters which would make a decision. This decision was made
in ngx_http_parse_chunked() and ngx_http_v3_parse_request_body() based on the
b->last_buf flag.
Now that since 0f7f1a509113 (1.19.2) rb->chunked->length is a lower threshold
for the expected number of bytes, it can be set to zero to indicate that more
bytes may or may not follow. Now it's possible to move the check for eof from
parser functions to ngx_http_request_body_chunked_filter() and clean up the
parsing code.
Also, in the default branch, in case of eof, the following three things
happened, which were replaced with returning NGX_ERROR while implementing
HTTP/3:
- "client prematurely closed connection" message was logged
- c->error flag was set
- NGX_HTTP_BAD_REQUEST was returned
The change brings back this behavior for HTTP/1 as well as HTTP/3.
If a packet sent in response to an initial client packet was lost, then
successive client initial packets were dropped by nginx with the unexpected
dcid message logged. This was because the new DCID generated by the server was
not available to the client.
The check tested the total size of a packet header and unprotected packet
payload, which doesn't include the packet number length and expansion of
the packet protection AEAD. If the packet was corrupted, it could cause
false triggering of the condition due to unsigned type underflow leading
to a connection error.
Existing checks for the QUIC header and protected packet payload lengths
should be enough.
From quic-tls draft, section 5.4.2:
An endpoint MUST discard packets that are not long enough to contain
a complete sample.
The check includes the Packet Number field assumed to be 4 bytes long.
During long packet header parsing, pkt->len is updated with the Length
field value that is used to find next coalesced packets in a datagram.
For short packets it still contained the whole QUIC packet size.
This change uniforms packet length handling to always contain the total
length of the packet number and protected packet payload in pkt->len.
Previously STOP_SENDING was sent to client upon stream closure if rev->eof and
rev->error were not set. This was an indirect indication that no RESET_STREAM
or STREAM fin has arrived. But it is indeed possible that rev->eof is not set,
but STREAM fin has already been received, just not read out by the application.
In this case sending STOP_SENDING does not make sense and can be misleading for
some clients.
The peer may issue additional connection IDs up to the limit defined by
transport parameter "active_connection_id_limit", using NEW_CONNECTION_ID
frames, and retire such IDs using RETIRE_CONNECTION_ID frame.
It is required to distinguish internal errors from corrupted packets and
perform actions accordingly: drop the packet or close the connection.
While there, made processing of ngx_quic_decrypt() erorrs similar and
removed couple of protocol violation errors.
quic-transport
5.2:
Packets that are matched to an existing connection are discarded if
the packets are inconsistent with the state of that connection.
5.2.2:
Servers MUST drop incoming packets under all other circumstances.
The removal of QUIC packet protection depends on the largest packet number
received. When a garbage packet was received, the decoder still updated the
largest packet number from that packet. This could affect removing protection
from subsequent QUIC packets.
As per HTTP/3 draft 29, section 4.1:
When the server does not need to receive the remainder of the request,
it MAY abort reading the request stream, send a complete response, and
cleanly close the sending part of the stream.
On QUIC connections, SSL_shutdown() is used to call the send_alert callback
to send a CONNECTION_CLOSE frame. The reverse side is handled by other means.
At least BoringSSL doesn't differentiate whether this is a QUIC SSL method,
so waiting for the peer's close_notify alert should be explicitly disabled.
The logical quic connection state is tested by handler functions that
process corresponding types of packets (initial/handshake/application).
The packet is declined if state is incorrect.
No timeout is required for the input queue.
If a client attemtps to start a new connection with unsupported version,
a version negotiation packet is sent that contains a list of supported
versions (currently this is a single version, selected at compile time).
The function ngx_http_upstream_check_broken_connection() terminates the HTTP/1
request if client sends eof. For QUIC (including HTTP/3) the c->write->error
flag is now checked instead. This flag is set when the entire QUIC connection
is closed or STOP_SENDING was received from client.
Previously the request body DATA frame header was read by one byte because
filters were called only when the requested number of bytes were read. Now,
after 08ff2e10ae92 (1.19.2), filters are called after each read. More bytes
can be read at once, which simplifies and optimizes the code.
This also reduces diff with the default branch.
Previously, such packets weren't handled as the resulting zero remaining time
prevented setting the loss detection timer, which, instead, could be disarmed.
For implementation details, see quic-recovery draft 29, appendix A.10.
The PTO handler is split into separate PTO and loss detection handlers
that operate interchangeably depending on which timer should be set.
The present ngx_quic_lost_handler is now only used for packet loss detection.
It replaces ngx_quic_pto_handler if there are packets preceeding largest_ack.
Once there is no more such packets, ngx_quic_pto_handler is installed again.
Probes carry unacknowledged data previously sent in the oldest packet number,
one per each packet number space. That is, it could be up to two probes.
PTO backoff is now increased before scheduling next probes.
In particular, this prevents declaring packet number 0 as lost if
there aren't yet any acknowledgements in this packet number space.
For example, only Initial packets were acknowledged in handshake.
Previously a single STREAM frame was created for each buffer in stream output
chain which is wasteful with respect to memory. The following changes were
made in the stream send code:
- ngx_quic_stream_send_chain() no longer calls ngx_quic_stream_send() and got
a separate implementation that coalesces neighbouring buffers into a single
frame
- the new ngx_quic_stream_send_chain() respects the limit argument, which fixes
sendfile_max_chunk and limit_rate
- ngx_quic_stream_send() is reimplemented to call ngx_quic_stream_send_chain()
- stream frame size limit is moved out to a separate function
ngx_quic_max_stream_frame()
- flow control is moved out to a separate function ngx_quic_max_stream_flow()
- ngx_quic_stream_send_chain() is relocated next to ngx_quic_stream_send()
Creating client-initiated streams is moved from ngx_quic_handle_stream_frame()
to a separate function ngx_quic_create_client_stream(). This function is
responsible for creating streams with lower ids as well.
Also, simplified and fixed initial data buffering in
ngx_quic_handle_stream_frame(). It is now done before calling the initial
handler as the handler can destroy the stream.
Previously this function generated an error trying to figure out if client shut
down the write end of the connection. The reason for this error was that a
QUIC stream has no socket descriptor. However checking for eof is not the
right thing to do for an HTTP/3 QUIC stream since HTTP/3 clients are expected
to shut down the write end of the stream after sending the request.
Now the function handles QUIC streams separately. It checks if c->read->error
is set. The error flags for c->read and c->write are now set for all streams
when closing the QUIC connection instead of setting the pending_eof flag.
According to quic-transport draft 29, section 19.3.1:
The value of the Gap field establishes the largest packet number
value for the subsequent ACK Range using the following formula:
largest = previous_smallest - gap - 2
Thus, given a largest packet number for the range, the smallest value
is determined by the formula:
smallest = largest - ack_range
While here, changed min/max to uint64_t for consistency.
A QUIC stream could be destroyed by handler while in ngx_quic_stream_input().
To detect this, ngx_quic_find_stream() is used to check that it still exists.
Previously, a stream id was passed to this routine off the frame structure.
In case of stream cleanup, it is freed along with other frames belonging to
the stream on cleanup. Then, a cleanup handler reuses last frames to update
MAX_STREAMS and serve other purpose. Thus, ngx_quic_find_stream() is passed
a reused frame with zeroed out part pointed by stream_id. If a stream with
id 0x0 still exists, this leads to use-after-free.
The limits on active bidi and uni client streams are maintained at their
initial values initial_max_streams_bidi and initial_max_streams_uni by sending
a MAX_STREAMS frame upon each client stream closure.
Also, the following is changed for data arriving to non-existing streams:
- if a stream was already closed, such data is ignored
- when creating a new stream, all streams of the same type with lower ids are
created too
The ngx_http_perl_module module doesn't have a notion of including additional
search paths through --with-cc-opt, which results in compile error incomplete
type 'enum ssl_encryption_level_t' when building nginx without QUIC support.
The enum is visible from quic event headers and eventually pollutes ngx_core.h.
The fix is to limit including headers to compile units that are real consumers.
According to quic-transport draft 29, section 21.12.1.1:
Prior to validation, endpoints are limited in what they are able to
send. During the handshake, a server cannot send more than three
times the data it receives; clients that initiate new connections or
migrate to a new network path are limited.
The ngx_quic_queue_frame() functions puts a frame into send queue and
schedules a push timer to actually send data.
The patch adds tracking for data amount in the queue and sends data
immediately if amount of data exceeds limit.
Instead of timer-based retransmissions with constant packet lifetime,
this patch implements ack-based loss detection and probe timeout
for the cases, when no ack is received, according to the quic-recovery
draft 29.
The c->quic->retransmit timer is now called "pto".
The ngx_quic_retransmit() function is renamed to "ngx_quic_detect_lost()".
This is a preparation for the following patches.
According to the quic-recovery 29, Section 5: Estimating the Round-Trip Time.
Currently, integer arithmetics is used, which loses sub-millisecond accuracy.
Previously, the expression (ch & 0x7f) was promoted to a signed integer.
Depending on the platform, the size of this integer could be less than 8 bytes,
leading to overflow when handling the higher bits of the result. Also, sign
bit of this integer could be replicated when adding to the 64-bit st->value.
Previously errors led only to closing streams.
To simplify closing QUIC connection from a QUIC stream context, new macro
ngx_http_v3_finalize_connection() is introduced. It calls
ngx_quic_finalize_connection() for the parent connection.
The function finalizes QUIC connection with an application protocol error
code and sends a CONNECTION_CLOSE frame with type=0x1d.
Also, renamed NGX_QUIC_FT_CONNECTION_CLOSE2 to NGX_QUIC_FT_CONNECTION_CLOSE_APP.
Previously dynamic table was not functional because of zero limit on its size
set by default. Now the following changes enable it:
- new directives to set SETTINGS_QPACK_MAX_TABLE_CAPACITY and
SETTINGS_QPACK_BLOCKED_STREAMS
- send settings with SETTINGS_QPACK_MAX_TABLE_CAPACITY and
SETTINGS_QPACK_BLOCKED_STREAMS to the client
- send Insert Count Increment to the client
- send Header Acknowledgement to the client
- evict old dynamic table entries on overflow
- decode Required Insert Count from client
- block stream if Required Insert Count is not reached
Client streams may send literal strings which are now limited in size by the
new directive. The default value is 4096.
The directive is similar to HTTP/2 directive http2_max_field_size.
So that connections are protected from failing from on-path attacks.
Decryption failure of long packets used during handshake still leads
to connection close since it barely makes sense to handle them there.
A previously used undefined error code is now replaced with the generic one.
Note that quic-transport prescribes keeping connection intact, discarding such
QUIC packets individually, in the sense that coalesced packets could be there.
This is selectively handled in the next change.
The patch removes remnants of the old state tracking mechanism, which did
not take into account assimetry of read/write states and was not very
useful.
The encryption state now is entirely tracked using SSL_quic_read/write_level().
quic-transport draft 29:
section 7:
* authenticated negotiation of an application protocol (TLS uses
ALPN [RFC7301] for this purpose)
...
Endpoints MUST explicitly negotiate an application protocol. This
avoids situations where there is a disagreement about the protocol
that is in use.
section 8.1:
When using ALPN, endpoints MUST immediately close a connection (see
Section 10.3 of [QUIC-TRANSPORT]) with a no_application_protocol TLS
alert (QUIC error code 0x178; see Section 4.10) if an application
protocol is not negotiated.
Changes in ngx_quic_close_quic() function are required to avoid attempts
to generated and send packets without proper keys, what happens in case
of failed ALPN check.
quic-transport draft 29, section 14:
QUIC depends upon a minimum IP packet size of at least 1280 bytes.
This is the IPv6 minimum size [RFC8200] and is also supported by most
modern IPv4 networks. Assuming the minimum IP header size, this
results in a QUIC maximum packet size of 1232 bytes for IPv6 and 1252
bytes for IPv4.
Since the packet size can change during connection lifetime, the
ngx_quic_max_udp_payload() function is introduced that currently
returns minimal allowed size, depending on address family.
quic-tls, 8.2:
The quic_transport_parameters extension is carried in the ClientHello
and the EncryptedExtensions messages during the handshake. Endpoints
MUST send the quic_transport_parameters extension; endpoints that
receive ClientHello or EncryptedExtensions messages without the
quic_transport_parameters extension MUST close the connection with an
error of type 0x16d (equivalent to a fatal TLS missing_extension
alert, see Section 4.10).
This is a temporary workaround, proper retransmission mechanism based on
quic-recovery rfc draft is yet to be implemented.
Currently hardcoded value is too small for real networks. The patch
sets static PTO, considering rtt of ~333ms, what gives about 1s.
Also, if both are present, require that they have the same value. These
requirements are specified in HTTP/3 draft 28.
Current implementation of HTTP/2 treats ":authority" and "Host"
interchangeably. New checks only make sure at least one of these values is
present in the request. A similar check existed earlier and was limited only
to HTTP/1.1 in 38c0898b6df7.
The flags was originally added by 8f038068f4bc, and is propagated correctly
in the stream module. With QUIC introduction, http module now uses datagram
sockets as well, thus the fix.
Preserving pointers within the client buffer is not needed for HTTP/3 because
all data is either allocated from pool or static. Unlike with HTTP/1, data
typically cannot be referenced directly within the client buffer. Trying to
preserve NULLs or external pointers lead to broken pointers.
Also, reverted changes in ngx_http_alloc_large_header_buffer() not relevant
for HTTP/3 to minimize diff to mainstream.
New field r->parse_start is introduced to substitute r->request_start and
r->header_name_start for request length accounting. These fields only work for
this purpose in HTTP/1 because HTTP/1 request line and header line start with
these values.
Also, error logging is now fixed to output the right part of the request.
As per HTTP/3 draft 27, a request or response containing uppercase header
field names MUST be treated as malformed. Also, existing rules applied
when parsing HTTP/1 header names are also applied to HTTP/3 header names:
- null character is not allowed
- underscore character may or may not be treated as invalid depending on the
value of "underscores_in_headers"
- all non-alphanumeric characters with the exception of '-' are treated as
invalid
Also, the r->locase_header field is now filled while parsing an HTTP/3
header.
Error logging for invalid headers is fixed as well.
The first one parses pseudo-headers and is analagous to the request line
parser in HTTP/1. The second one parses regular headers and is analogous to
the header parser in HTTP/1.
Additionally, error handling of client passing malformed uri is now fixed.
The function ngx_http_parse_chunked() is also called from the proxy module to
parse the upstream response. It should always parse HTTP/1 body in this case.
According to quic-transport draft 28 section 10.3.1:
When sending CONNECTION_CLOSE, the goal is to ensure that the peer
will process the frame. Generally, this means sending the frame in a
packet with the highest level of packet protection to avoid the
packet being discarded. After the handshake is confirmed (see
Section 4.1.2 of [QUIC-TLS]), an endpoint MUST send any
CONNECTION_CLOSE frames in a 1-RTT packet. However, prior to
confirming the handshake, it is possible that more advanced packet
protection keys are not available to the peer, so another
CONNECTION_CLOSE frame MAY be sent in a packet that uses a lower
packet protection level.
There is no need in a separate type for the QUIC connection state.
The only state not found in the SSL library is NGX_QUIC_ST_UNAVAILABLE,
which is actually a flag used by the ngx_quic_close_quic() function
to prevent cleanup of uninitialized connection.
Sections 4.10.1 and 4.10.2 of quic transport describe discarding of initial
and handshake keys. Since the keys are discarded, we no longer need
to retransmit packets and corresponding queues should be emptied.
This patch removes previously added workaround that did not require
acknowledgement for initial packets, resulting in avoiding retransmission,
which is wrong because a packet could be lost and we have to retransmit it.
It was possible that retransmit timer was not set after the first
retransmission attempt, due to ngx_quic_retransmit() did not set
wait time properly, and the condition in retransmit handler was incorrect.
Section 17.2 and 17.3 of QUIC transport:
Fixed bit: Packets containing a zero value for this bit are not
valid packets in this version and MUST be discarded.
Reserved bit: An endpoint MUST treat receipt of a packet that has
a non-zero value for these bits, after removing both packet and
header protection, as a connection error of type PROTOCOL_VIOLATION.
When an error occurs, then c->quic->error field may be populated
with an appropriate error code, and the CONNECTION CLOSE frame will be
sent to the peer before the connection is closed. Otherwise, the error
treated as internal and INTERNAL_ERROR code is sent.
The pkt->error field is populated by functions processing packets to
indicate an error when it does not fit into pass/fail return status.
As per QUIC transport, the first flight of 0-RTT packets obviously uses same
Destination and Source Connection ID values as the client's first Initial.
The fix is to match 0-RTT against original DCID after it has been switched.
The ordered frame handler is always called for the existing stream, as it is
allocated from this stream. Instead of searching stream by id, pointer to the
stream node is passed.
The idea is to skip any zeroes that follow valid QUIC packet. Currently such
behavior can be only observed with Firefox which sends zero-padded initial
packets.
Now there's no need to annotate every frame in ACK-eliciting packet.
Sending ACK was moved to the first place, so that queueing ACK frame
no longer postponed up to the next packet after pushing STREAM frames.
+ added "quic" prefix to all error messages
+ rephrased some messages
+ removed excessive error logging from frame parser
+ added ngx_quic_check_peer() function to check proper source/destination
match and do it one place
- the ngx_quic_hexdump0() macro is renamed to ngx_quic_hexdump();
the original ngx_quic_hexdump() macro with variable argument is
removed, extra information is logged normally, with ngx_log_debug()
- all labels in hex dumps are prefixed with "quic"
- the hexdump format is simplified, length is moved forward to avoid
situations when the dump is truncated, and length is not shown
- ngx_quic_flush_flight() function contents is debug-only, placed under
NGX_DEBUG macro to avoid "unused variable" warnings from compiler
- frame names in labels are capitalized, similar to other places
+ all dumps are moved under one of the following macros (undefined by default):
NGX_QUIC_DEBUG_PACKETS
NGX_QUIC_DEBUG_FRAMES
NGX_QUIC_DEBUG_FRAMES_ALLOC
NGX_QUIC_DEBUG_CRYPTO
+ all QUIC debug messages got "quic " prefix
+ all input frames are reported as "quic frame in FOO_FRAME bar:1 baz:2"
+ all outgoing frames re reported as "quic frame out foo bar baz"
+ all stream operations are prefixed with id, like: "quic stream id 0x33 recv"
+ all transport parameters are prefixed with "quic tp"
(hex dump is moved to caller, to avoid using ngx_cycle->log)
+ packet flags and some other debug messages are updated to
include packet type
We always generate stream frames that have length. The 'len' member is used
during parsing incoming frames and can be safely ignored when generating
output.
There are following flags in quic connection:
closing - true, when a connection close is initiated, for whatever reason
draining - true, when a CC frame is received from peer
The following state machine is used for closing:
+------------------+
| I/HS/AD |
+------------------+
| | |
| | V
| | immediate close initiated:
| | reasons: close by top-level protocol, fatal error
| | + sends CC (probably with app-level message)
| | + starts close_timer: 3 * PTO (current probe timeout)
| | |
| | V
| | +---------+ - Reply to input with CC (rate-limited)
| | | CLOSING | - Close/Reset all streams
| | +---------+
| | | |
| V V |
| receives CC |
| | |
idle | |
timer | |
| V |
| +----------+ | - MUST NOT send anything (MAY send a single CC)
| | DRAINING | | - if not already started, starts close_timer: 3 * PTO
| +----------+ | - if not already done, close all streams
| | |
| | |
| close_timer fires
| |
V V
+------------------------+
| CLOSED | - clean up all the resources, drop connection
+------------------------+ state completely
The ngx_quic_close_connection() function gets an "rc" argument, that signals
reason of connection closing:
NGX_OK - initiated by application (i.e. http/3), follow state machine
NGX_DONE - timedout (while idle or draining)
NGX_ERROR - fatal error, destroy connection immediately
The PTO calculations are not yet implemented, hardcoded value of 5s is used.
The function is split into three:
ngx_quic_close_connection() itself cleans up all core nginx things
ngx_quic_close_quic() deals with everything inside c->quic
ngx_quic_close_streams() deals with streams cleanup
The quic and streams cleanup functions may return NGX_AGAIN, thus signalling
that cleanup is not ready yet, and the close cannot continue to next step.
The header size macros for long and short packets were fixed to provide
correct values in bytes.
Currently the sending code limits frames so they don't exceed max_packet_size.
But it does not account the case when a single frame can exceed the limit.
As a result of this patch, big payload (CRYPTO and STREAM) will be split
into a number of smaller frames that fit into advertised max_packet_size
(which specifies final packet size, after encryption).
chrome-unstable 83.0.4103.7 starts with Initial packet number 1.
I couldn't find a proper explanation besides this text in quic-transport:
An endpoint MAY skip packet numbers when sending
packets to detect this (Optimistic ACK Attack) behavior.
+ MAX_STREAM_DATA frame is sent when recv() is performed on stream
The new value is a sum of total bytes received by stream + free
space in a buffer;
The sending of MAX_STREM_DATA frame in response to STREAM_DATA_BLOCKED
frame is adjusted to follow the same logic as above.
+ MAX_DATA frame is sent when total amount of received data is 2x
of current limit. The limit is doubled.
+ Default values of transport parameters are adjusted to more meaningful
values:
initial stream limits are set to quic buffer size instead of
unrealistically small 255.
initial max data is decreased to 16 buffer sizes, in an assumption that
this is enough for a relatively short connection, instead of randomly
chosen big number.
All this allows to initiate a stable flow of streams that does not block
on stream/connection limits (tested with FF 77.0a1 and 100K requests)
Before the patch, full STREAM frame handling was delayed until the frame with
zero offset is received. Only node in the streams tree was created.
This lead to problems when such stream was deleted, in particular, it had no
handlers set for read events.
This patch creates new stream immediately, but delays data delivery until
the proper offset will arrive. This is somewhat similar to how accept()
operation works.
The ngx_quic_add_stream() function is no longer needed and merged into stream
handler. The ngx_quic_stream_input() now only handles frames for existing
streams and does not deal with stream creation.
Frames can still float in the following queues:
- crypto frames reordering queues (one per encryption level)
- moved crypto frames cleanup to the moment where all streams are closed
- stream frames reordering queues (one per packet number namespace)
- frames retransmit queues (one per packet number namespace)
Each stream node now includes incoming frames queue and sent/received counters
for tracking offset. The sent counter is not used, c->sent is used, not like
in crypto buffers, which have no connections.
If offset in CRYPTO frame doesn't match expected, following actions are taken:
a) Duplicate frames or frames within [0...current offset] are ignored
b) New data from intersecting ranges (starts before current_offset, ends
after) is consumed
c) "Future" frames are stored in a sorted queue (min offset .. max offset)
Once a frame is consumed, current offset is updated and the queue is inspected:
we iterate the queue until the gap is found and act as described
above for each frame.
The amount of data in buffered frames is limited by corresponding macro.
The CRYPTO and STREAM frame structures are now compatible: they share
the same set of initial fields. This allows to have code that deals with
both of this frames.
The ordering layer now processes the frame with offset and invokes the
handler when it can organise an ordered stream of data.
Quote: Conceptually, a packet number space is the context in which a packet
can be processed and acknowledged.
ngx_quic_namespace_t => ngx_quic_send_ctx_t
qc->ns => qc->send_ctx
ns->largest => send_ctx->largest_ack
The ngx_quic_ns(level) macro now returns pointer, not just index:
ngx_quic_get_send_ctx(c->quic, level)
ngx_quic_retransmit_ns() => ngx_quic_retransmit()
ngx_quic_output_ns() => ngx_quic_output_frames()
The offset in client CRYPTO frames is tracked in c->quic->crypto_offset_in.
This means that CRYPTO frames with non-zero offset are now accepted making
possible to finish handshake with client certificates that exceed max packet
size (if no reordering happens).
The c->quic->crypto_offset field is renamed to crypto_offset_out to avoid
confusion with tracking of incoming CRYPTO stream.
+ since number of ranges in unknown, provide a function to parse them once
again in handler to avoid memory allocation
+ ack handler now processes all ranges, not only the first
+ ECN counters are parsed and saved into frame if present
Such frames are grouped together in a switch and just ignored, instead of
closing the connection This may improve test coverage. All such frames
require acknowledgment.
The qc->closing flag is set when a connection close is initiated for the first
time.
No timers will be set if the flag is active.
TODO: this is a temporary solution to avoid running timer handlers after
connection (and it's pool) was destroyed. It looks like currently we have
no clear policy of connection closing in regard to timers.
Found with a previously received Initial packet with ACK only, which
instantiates a new connection but do not produce the handshake keys.
This can be triggered by a fairly well behaving client, if the server
stands behind a load balancer that stripped Initial packets exchange.
Found by F5 test suite.
This makes sending large number of bidirectional stream work within ngtcp2,
which doesn't bother sending optional STREAMS_BLOCKED when exhausted.
This also introduces tracking currently opened and maximum allowed streams.
Currently, the output is called periodically, each 200 ms to invoke
ngx_quic_output() that will push all pending frames into packets.
TODO: implement flags a-là Nagle & co (NO_DELAY/NO_PUSH...)
All frames collected to packet are moved into a per-namespace send queue.
QUIC connection has a timer which fires on the closest max_ack_delay time.
The frame is deleted from the queue when a corresponding packet is acknowledged.
The NGX_QUIC_MAX_RETRANSMISSION is a timeout that defines maximum length
of retransmission of a frame.
The quic->keys[4] array now contains secrets related to the corresponding
encryption level. All protection-level functions get proper keys and do
not need to switch manually between levels.
If early data is accepted, SSL_do_handshake() completes as soon as ClientHello
is processed. SSL_in_init() will report the handshake is still in progress.
Static buffers are used instead in functions where decryption takes place.
The pkt->plaintext points to the beginning of a static buffer.
The pkt->payload.data points to decrypted data actual start.
+ ngx_quic_encrypt():
- no longer accepts pool as argument
- pkt is 1st arg
- payload is passed as pkt->payload
- performs encryption to the specified static buffer
+ ngx_quic_create_long/short_packet() functions:
- single buffer for everything, allocated by caller
- buffer layout is: [ ad | payload | TAG ]
the result is in the beginning of buffer with proper length
- nonce is calculated on stack
- log is passed explicitly, pkt is 1st arg
- no more allocations inside
+ ngx_quic_create_long_header():
- args changed: no need to pass str_t
+ added ngx_quic_create_short_header()
+ Client-related errors (i.e. parsing) are done at INFO level
+ c->log->action is updated through the process of receiving, parsing.
handling packet/payload and generating frames/output.
For ngx_http_process_request() part to work, this required to set both
r->http_connection->ssl and c->ssl on a QUIC stream. To avoid damaging
global SSL object, ngx_ssl_shutdown() is managed to ignore QUIC streams.
+ ngx_quic_init_ssl_methods() is no longer there, we setup methods on SSL
connection directly.
+ the handshake_handler is actually a generic quic input handler
+ updated c->log->action and debug to reflect changes and be more informative
+ c->quic is always set in ngx_quic_input()
+ the quic connection state is set by the results of SSL_do_handshake();
note:
+ parameters are available in SSL connection since they are obtained by ssl
stack
quote:
During connection establishment, both endpoints make authenticated
declarations of their transport parameters. These declarations are
made unilaterally by each endpoint.
and really, we send our parameters before we read client's.
no handling of incoming parameters is made by this patch.
- integer parameters can be configured using the following directives:
quic_max_idle_timeout
quic_max_ack_delay
quic_max_packet_size
quic_initial_max_data
quic_initial_max_stream_data_bidi_local
quic_initial_max_stream_data_bidi_remote
quic_initial_max_stream_data_uni
quic_initial_max_streams_bidi
quic_initial_max_streams_uni
quic_ack_delay_exponent
quic_active_migration
quic_active_connection_id_limit
- only following parameters are actually sent:
active_connection_id_limit
initial_max_streams_uni
initial_max_streams_bidi
initial_max_stream_data_bidi_local
initial_max_stream_data_bidi_remote
initial_max_stream_data_uni
(other parameters are to be added into ngx_quic_create_transport_params()
function as needed, should be easy now)
- draft 24 and draft 27 are now supported
(at compile-time using quic_version macro)
The ngx_quic_parse_frame() functions now has new 'pkt' argument: the packet
header of a currently processed frame. This allows to log errors/debug
closer to reasons and perform additional checks regarding possible frame
types. The handler only performs processing of good frames.
A number of functions like read_uint32(), parse_int[_multi] probably should
be implemented as a macro, but currently it is better to have them as
functions for simpler debugging.
Cleanup in ngx_event_quic.c:
+ reorderded functions, structures
+ added missing prototypes
+ added separate handlers for each frame type
+ numerous indentation/comments/TODO fixes
+ removed non-implemented qc->state and corresponding enum;
this requires deep thinking, stub was unused.
+ streams inside quic connection are now in own structure
All code dealing with serializing/deserializing
is moved int srv/event/ngx_event_quic_transport.c/h file.
All macros for dealing with data are internal to source file.
The header file exposes frame types and error codes.
The exported functions are currently packet header parsers and writers
and frames parser/writer.
The ngx_quic_header_t structure is updated with 'log' member. This avoids
passing extra argument to parsing functions that need to report errors.
+ support for more than one initial packet
+ workaround for trailing zeroes in packet
+ ignore application data packet if no keys yet (issue in draft 27/ff nightly)
+ fixed PING frame parser
+ STREAM frames need to be acknowledged
The following HTTP configuration is used for firefox (v74):
http {
ssl_certificate_key localhost.key;
ssl_certificate localhost.crt;
ssl_protocols TLSv1.2 TLSv1.3;
server {
listen 127.0.0.1:10368 reuseport http3;
ssl_quic on;
server_name localhost;
location / {
return 200 "This-is-QUICK\n";
}
}
server {
listen 127.0.0.1:5555 ssl; # point the browser here
server_name localhost;
location / {
add_header Alt-Svc 'h3-24=":10368";ma=100';
return 200 "ALT-SVC";
}
}
}
New files:
src/event/ngx_event_quic_protection.h
src/event/ngx_event_quic_protection.c
The protection.h header provides interface to the crypto part of the QUIC:
2 functions to initialize corresponding secrets:
ngx_quic_set_initial_secret()
ngx_quic_set_encryption_secret()
and 2 functions to deal with packet processing:
ngx_quic_encrypt()
ngx_quic_decrypt()
Also, structures representing secrets are defined there.
All functions require SSL connection and a pool, only crypto operations
inside, no access to nginx connections or events.
Currently pool->log is used for the logging (instead of original c->log).
- events handling moved into src/event/ngx_event_quic.c
- http invokes once ngx_quic_run() and passes stream callback
(diff to original http_request.c is now minimal)
- streams are stored in rbtree using ID as a key
- when a new stream is registered, appropriate callback is called
- ngx_quic_stream_t type represents STREAM and stored in c->qs
- now NEW_CONNECTION_ID frames can be received and parsed
The packet structure is created in ngx_quic_input() and passed
to all handlers (initial, handshake and application data).
The UDP datagram buffer is saved as pkt->raw;
The QUIC packet is stored as pkt->data and pkt->len (instead of pkt->buf)
(pkt->len is adjusted after parsing headers to actual length)
The pkt->pos is removed, pkt->raw->pos is used instead.
- added basic parsing of ACK, PING and PADDING frames on input
- added preliminary parsing of SHORT headers
The ngx_quic_output() is now called after processing of each input packet.
Frames are added into output queue according to their level: inital packets
go ahead of handshake and application data, so they can be merged properly.
The payload handler is called from both new, handshake and applicataion data
handlers (latter is a stub).
As was objserved with ngtcp2 client, Finished CRYPTO frame within Handshake
packet may not be sent for some reason if there's nothing to append on 1-RTT.
This results in unnecessary retransmit. To avoid this edge case, a non-zero
active_connection_id_limit transport parameter is now used to append datagram
with NEW_CONNECTION_ID 1-RTT frames.
Now handshake generates frames, and they are queued in c->quic->frames.
The ngx_quic_output() is called from ngx_quic_flush_flight() or manually,
processes the queue and encrypts all frames according to required encryption
level.
ngx_quic_hexdump0(log, format, buffer, buffer_size);
- logs hexdump of buffer to specified error log
ngx_quic_hexdump0(c->log, "this is foo:", foo.data, foo.len);
ngx_quic_hexdump(log, format, buffer, buffer_size, ...)
- same as hexdump0, but more format/args possible:
ngx_quic_hexdump(c->log, "a=%d b=%d, foo is:", foo.data, foo.len, a, b);
Introduced ngx_quic_input() and ngx_quic_output() as interface between
nginx and protocol. They are the only functions that are exported.
While there, added copyrights.
For NGINX troubleshooting/technical help, please visit our community forum instead of asking your questions here. We will politely redirect these types of questions to the forum.
- type:textarea
id:general
attributes:
label:What would you like to discuss?
description:Please provide as much context as possible. Remember that only general discussions related to the NGINX codebase will be addressed on GitHub. For anything else, please visit our [community forum](https://community.nginx.org/).
For NGINX troubleshooting/technical help, please visit our community forum instead of asking your questions here. We will politely redirect these types of questions to the forum.
- type:textarea
id:ideas
attributes:
label:What idea would you like to discuss?
description:Please provide as much context as possible. Remember that only ideas related to the NGINX codebase will be addressed on GitHub. For anything else, please visit our [community forum](https://community.nginx.org/).
For NGINX troubleshooting/technical help, please visit our community forum instead of asking your questions here. We will politely redirect these types of questions to the forum.
- type:textarea
id:q-a
attributes:
label:What question do you have?
description:Please provide as much context as possible. Remember that only questions related to the NGINX codebase will be addressed on GitHub. For anything else, please visit our [community forum](https://community.nginx.org/).
Thanks for taking the time to fill out this bug report!
Before you continue filling out this report, please take a moment to check that your bug has not been [already reported on GitHub][issue search], is reproducible with the latest version of nginx, and does not involve any third-party modules 🙌
Remember to redact any sensitive information such as authentication credentials and/or license keys!
**Note:**If you are seeking community support, please start a new topic in the [NGINX Community forum][forum]. If you wish to discuss the codebase, please start a new thread via [GitHub discussions][discussions].
Thanks for taking the time to fill out this feature request!
Before you continue filling out this request, please take a moment to check that your feature has not been [already requested on GitHub][issue search] 🙌
**Note:**If you are seeking community support, please start a new topic in the [NGINX Community forum][forum]. If you wish to discuss the codebase, please start a new thread via [GitHub discussions][discussions].
Describe the use case and detail of the change. If this PR addresses an issue on GitHub, make sure to include a link to that issue using one of the [supported keywords](https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue) in this PR's description or commit message.
### Checklist
Before creating a PR, run through this checklist and mark each as complete:
- [ ] I have read the [contributing guidelines](/CONTRIBUTING.md).
- [ ] I have checked that NGINX compiles and runs after adding my changes.
if:(github.event.comment.body == 'recheck' || github.event.comment.body == 'I have hereby read the F5 CLA and agree to its terms') || github.event_name == 'pull_request_target'
custom-notsigned-prcomment:'🎉 Thank you for your contribution! It appears you have not yet signed the [F5 Contributor License Agreement (CLA)](https://github.com/f5/f5-cla/blob/main/docs/f5_cla.md), which is required for your changes to be incorporated into an F5 Open Source Software (OSS) project. Please kindly read the [F5 CLA](https://github.com/f5/f5-cla/blob/main/docs/f5_cla.md) and reply on a new comment with the following text to agree:'
custom-pr-sign-comment:'I have hereby read the F5 CLA and agree to its terms'
custom-allsigned-prcomment:'✅ All required contributors have signed the F5 CLA for this PR. Thank you!'
# Remote repository storing CLA signatures.
remote-organization-name:f5
remote-repository-name:f5-cla-data
# Branch where CLA signatures are stored.
branch:main
path-to-signatures:signatures/signatures.json
# Comma separated list of usernames for maintainers or any other individuals who should not be prompted for a CLA.
# NOTE: You will want to edit the usernames to suit your project needs.
[](https://www.repostatus.org/#active)
[](/CODE_OF_CONDUCT.md)
NGINX (pronounced "engine x" or "en-jin-eks") is the world's most popular Web Server, high performance Load Balancer, Reverse Proxy, API Gateway and Content Cache.
NGINX is free and open source software, distributed under the terms of a simplified [2-clause BSD-like license](LICENSE).
Enterprise distributions, commercial support and training are available from [F5, Inc](https://www.f5.com/products/nginx).
> [!IMPORTANT]
> The goal of this README is to provide a basic, structured introduction to NGINX for novice users. Please refer to the [full NGINX documentation](https://nginx.org/en/docs/) for detailed information on [installing](https://nginx.org/en/docs/install.html), [building](https://nginx.org/en/docs/configure.html), [configuring](https://nginx.org/en/docs/dirindex.html), [debugging](https://nginx.org/en/docs/debugging_log.html), and more. These documentation pages also contain a more detailed [Beginners Guide](https://nginx.org/en/docs/beginners_guide.html), How-Tos, [Development guide](https://nginx.org/en/docs/dev/development_guide.html), and a complete module and [directive reference](https://nginx.org/en/docs/dirindex.html).
# Table of contents
- [How it works](#how-it-works)
- [Modules](#modules)
- [Configurations](#configurations)
- [Runtime](#runtime)
- [Downloading and installing](#downloading-and-installing)
- [Stable and Mainline binaries](#stable-and-mainline-binaries)
- [Cloning the NGINX GitHub repository](#cloning-the-nginx-github-repository)
- [Configuring the build](#configuring-the-build)
- [Compiling](#compiling)
- [Location of binary and installation](#location-of-binary-and-installation)
- [Running and testing the installed binary](#running-and-testing-the-installed-binary)
- [Asking questions and reporting issues](#asking-questions-and-reporting-issues)
- [Contributing code](#contributing-code)
- [Additional help and resources](#additional-help-and-resources)
- [Changelog](#changelog)
- [License](#license)
# How it works
NGINX is installed software with binary packages available for all major operating systems and Linux distributions. See [Tested OS and Platforms](https://nginx.org/en/#tested_os_and_platforms) for a full list of compatible systems.
> [!IMPORTANT]
> While nearly all popular Linux-based operating systems are distributed with a community version of nginx, we highly advise installation and usage of official [packages](https://nginx.org/en/linux_packages.html) or sources from this repository. Doing so ensures that you're using the most recent release or source code, including the latest feature-set, fixes and security patches.
## Modules
NGINX is comprised of individual modules, each extending core functionality by providing additional, configurable features. See "Modules reference" at the bottom of [nginx documentation](https://nginx.org/en/docs/) for a complete list of official modules.
NGINX modules can be built and distributed as static or dynamic modules. Static modules are defined at build-time, compiled, and distributed in the resulting binaries. See [Dynamic Modules](#dynamic-modules) for more information on how they work, as well as, how to obtain, install, and configure them.
> [!TIP]
> You can issue the following command to see which static modules your NGINX binaries were built with:
```bash
nginx -V
```
> See [Configuring the build](#configuring-the-build) for information on how to include specific Static modules into your nginx build.
## Configurations
NGINX is highly flexible and configurable. Provisioning the software is achieved via text-based config file(s) accepting parameters called "[Directives](https://nginx.org/en/docs/dirindex.html)". See [Configuration File's Structure](https://nginx.org/en/docs/beginners_guide.html#conf_structure) for a comprehensive description of how NGINX configuration files work.
> [!NOTE]
> The set of directives available to your distribution of NGINX is dependent on which [modules](#modules) have been made available to it.
## Runtime
Rather than running in a single, monolithic process, NGINX is architected to scale beyond Operating System process limitations by operating as a collection of processes. They include:
- A "master" process that maintains worker processes, as well as, reads and evaluates configuration files.
- One or more "worker" processes that process data (eg. HTTP requests).
The number of [worker processes](https://nginx.org/en/docs/ngx_core_module.html#worker_processes) is defined in the configuration file and may be fixed for a given configuration or automatically adjusted to the number of available CPU cores. In most cases, the latter option optimally balances load across available system resources, as NGINX is designed to efficiently distribute work across all worker processes.
> [!TIP]
> Processes synchronize data through shared memory. For this reason, many NGINX directives require the allocation of shared memory zones. As an example, when configuring [rate limiting](https://nginx.org/en/docs/http/ngx_http_limit_req_module.html#limit_req), connecting clients may need to be tracked in a [common memory zone](https://nginx.org/en/docs/http/ngx_http_limit_req_module.html#limit_req_zone) so all worker processes can know how many times a particular client has accessed the server in a span of time.
# Downloading and installing
Follow these steps to download and install precompiled NGINX binaries. You may also choose to [build NGINX locally from source code](#building-from-source).
## Stable and Mainline binaries
NGINX binaries are built and distributed in two versions: stable and mainline. Stable binaries are built from stable branches and only contain critical fixes backported from the mainline version. Mainline binaries are built from the [master branch](https://github.com/nginx/nginx/tree/master) and contain the latest features and bugfixes. You'll need to [decide which is appropriate for your purposes](https://docs.nginx.com/nginx/admin-guide/installing-nginx/installing-nginx-open-source/#choosing-between-a-stable-or-a-mainline-version).
## Linux binary installation process
The NGINX binary installation process takes advantage of package managers native to specific Linux distributions. For this reason, first-time installations involve adding the official NGINX package repository to your system's package manager. Follow [these steps](https://nginx.org/en/linux_packages.html) to download, verify, and install NGINX binaries using the package manager appropriate for your Linux distribution.
### Upgrades
Future upgrades to the latest version can be managed using the same package manager without the need to manually download and verify binaries.
## FreeBSD installation process
For more information on installing NGINX on FreeBSD system, visit https://nginx.org/en/docs/install.html
## Windows executables
Windows executables for mainline and stable releases can be found on the main [NGINX download page](https://nginx.org/en/download.html). Note that the current implementation of NGINX for Windows is at the Proof-of-Concept stage and should only be used for development and testing purposes. For additional information, please see [nginx for Windows](https://nginx.org/en/docs/windows.html).
## Dynamic modules
NGINX version 1.9.11 added support for [Dynamic Modules](https://nginx.org/en/docs/ngx_core_module.html#load_module). Unlike Static modules, dynamically built modules can be downloaded, installed, and configured after the core NGINX binaries have been built. [Official dynamic module binaries](https://nginx.org/en/linux_packages.html#dynmodules) are available from the same package repository as the core NGINX binaries described in previous steps.
> [!TIP]
> [NGINX JavaScript (njs)](https://github.com/nginx/njs), is a popular NGINX dynamic module that enables the extension of core NGINX functionality using familiar JavaScript syntax.
> [!IMPORTANT]
> If desired, dynamic modules can also be built statically into NGINX at compile time.
# Getting started with NGINX
For a gentle introduction to NGINX basics, please see our [Beginner’s Guide](https://nginx.org/en/docs/beginners_guide.html).
## Installing SSL certificates and enabling TLS encryption
See [Configuring HTTPS servers](https://nginx.org/en/docs/http/configuring_https_servers.html) for a quick guide on how to enable secure traffic to your NGINX installation.
## Load Balancing
For a quick start guide on configuring NGINX as a Load Balancer, please see [Using nginx as HTTP load balancer](https://nginx.org/en/docs/http/load_balancing.html).
## Rate limiting
See our [Rate Limiting with NGINX](https://blog.nginx.org/blog/rate-limiting-nginx) blog post for an overview of core concepts for provisioning NGINX as an API Gateway.
## Content caching
See [A Guide to Caching with NGINX and NGINX Plus](https://blog.nginx.org/blog/nginx-caching-guide) blog post for an overview of how to use NGINX as a content cache (e.g. edge server of a content delivery network).
# Building from source
The following steps can be used to build NGINX from source code available in this repository.
## Installing dependencies
Most Linux distributions will require several dependencies to be installed in order to build NGINX. The following instructions are specific to the `apt` package manager, widely available on most Ubuntu/Debian distributions and their derivatives.
> [!TIP]
> It is always a good idea to update your package repository lists prior to installing new packages.
> ```bash
> sudo apt update
> ```
### Installing compiler and make utility
Use the following command to install the GNU C compiler and Make utility.
```bash
sudo apt install gcc make
```
### Installing dependency libraries
```bash
sudo apt install libpcre3-dev zlib1g-dev
```
> [!WARNING]
> This is the minimal set of dependency libraries needed to build NGINX with rewriting and gzip capabilities. Other dependencies may be required if you choose to build NGINX with additional modules. Monitor the output of the `configure` command discussed in the following sections for information on which modules may be missing. For example, if you plan to use SSL certificates to encrypt traffic with TLS, you'll need to install the OpenSSL library. To do so, issue the following command.
>```bash
>sudo apt install libssl-dev
## Cloning the NGINX GitHub repository
Using your preferred method, clone the NGINX repository into your development directory. See [Cloning a GitHub Repository](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository) for additional help.
```bash
git clone https://github.com/nginx/nginx.git
```
## Configuring the build
Prior to building NGINX, you must run the `configure` script with [appropriate flags](https://nginx.org/en/docs/configure.html). This will generate a Makefile in your NGINX source root directory that can then be used to compile NGINX with [options specified during configuration](https://nginx.org/en/docs/configure.html).
From the NGINX source code repository's root directory:
```bash
auto/configure
```
> [!IMPORTANT]
> Configuring the build without any flags will compile NGINX with the default set of options. Please refer to https://nginx.org/en/docs/configure.html for a full list of available build configuration options.
## Compiling
The `configure` script will generate a `Makefile` in the NGINX source root directory upon successful execution. To compile NGINX into a binary, issue the following command from that same directory:
```bash
make
```
## Location of binary and installation
After successful compilation, a binary will be generated at `<NGINX_SRC_ROOT_DIR>/objs/nginx`. To install this binary, issue the following command from the source root directory:
```bash
sudo make install
```
> [!IMPORTANT]
> The binary will be installed into the `/usr/local/nginx/` directory.
## Running and testing the installed binary
To run the installed binary, issue the following command:
```bash
sudo /usr/local/nginx/sbin/nginx
```
You may test NGINX operation using `curl`.
```bash
curl localhost
```
The output of which should start with:
```html
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
```
# Asking questions and reporting issues
See our [Support](SUPPORT.md) guidelines for information on how discuss the codebase, ask troubleshooting questions, and report issues.
# Contributing code
Please see the [Contributing](CONTRIBUTING.md) guide for information on how to contribute code.
# Additional help and resources
- See the [NGINX Community Blog](https://blog.nginx.org/) for more tips, tricks and HOW-TOs related to NGINX and related projects.
- Access [nginx.org](https://nginx.org/), your go-to source for all documentation, information and software related to the NGINX suite of projects.
# Changelog
See our [changelog](https://nginx.org/en/CHANGES) to keep track of updates.
# License
[2-clause BSD-like license](LICENSE)
---
Additional documentation available at: https://nginx.org/en/docs
This document provides an overview of security concerns related to nginx
deployments, focusing on confidentiality, integrity, availability, and the
implications of configurations and misconfigurations.
## Reporting a Vulnerability
Please report any vulnerabilities via one of the following methods
(in order of preference):
1. [Report a vulnerability](https://docs.github.com/en/code-security/security-advisories/guidance-on-reporting-and-writing-information-about-vulnerabilities/privately-reporting-a-security-vulnerability)
within this repository. We are using the GitHub workflow that allows us to
manage vulnerabilities in a private manner and interact with reporters
securely.
2. [Report directly to F5](https://www.f5.com/services/support/report-a-vulnerability).
3. Report via email to security-alert@nginx.org.
This method will be deprecated in the future.
### Vulnerability Disclosure and Fix Process
The nginx team expects that all suspected vulnerabilities be reported
privately via the
[Reporting a Vulnerability](SECURITY.md#reporting-a-vulnerability) guidelines.
If a publicly released vulnerability is reported, we
may request to handle it according to the private disclosure process.
If the reporter agrees, we will follow the private disclosure process.
Security fixes will be applied to all supported stable releases, as well
as the mainline version, as applicable. We recommend using the most recent
mainline or stable release of nginx. Fixes are created and tested by the core
team using a GitHub private fork for security. If necessary, the reporter
may be invited to contribute to the fork and assist with the solution.
The nginx team is committed to responsible information disclosure with
sufficient detail, such as the CVSS score and vector. Privately disclosed
vulnerabilities are embargoed by default until the fix is released.
Communications and fixes remain private until made public. As nginx is