| Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
RFC 9002, Section 6.1.1 defines packet reordering threshold as 3. Testing
shows that such low value leads to spurious packet losses followed by
congestion window collapse. The change implements dynamic packet threshold
detection based on in-flight packet range. Packet threshold is defined
as half the number of in-flight packets, with mininum value of 3.
Also, renamed ngx_quic_lost_threshold() to ngx_quic_time_threshold()
for better compliance with RFC 9002 terms.
|
|
Previosly the threshold was hardcoded at 10000. This value is too low for
high BDP networks. For example, if all frames are STREAM frames, and MTU
is 1500, the upper limit for congestion window would be roughly 15M
(10000 * 1500). With 100ms RTT it's just a 1.2Gbps network (15M * 10 * 8).
In reality, the limit is even lower because of other frame types. Also,
the number of frames that could be used simultaneously depends on the total
amount of data buffered in all server streams, and client flow control.
The change sets frame threshold based on max concurrent streams and stream
buffer size, the product of which is the maximum number of in-flight stream
data in all server streams at any moment. The value is divided by 2000 to
account for a typical MTU 1500 and the fact that not all frames are STREAM
frames.
|
|
|
|
If connection is network-limited, MTU probes have little chance of being
sent since congestion window is almost always full. As a result, PMTUD
may not be able to reach the real MTU and the connection may operate with
a reduced MTU. The solution is to ignore the congestion window. This may
lead to a temporary increase in in-flight count beyond congestion window.
|
|
As per RFC 9000, Section 14.4:
Loss of a QUIC packet that is carried in a PMTU probe is therefore
not a reliable indication of congestion and SHOULD NOT trigger a
congestion control reaction.
|
|
As per RFC 9002, Section 7.8, congestion window should not be increased
when it's underutilized.
|
|
Previously, these functions operated on a per-level basis. This however
resulted in excessive logging of in_flight and will also led to extra
work detecting underutilized congestion window in the followup patches.
|
|
On some systems the value of ngx_current_msec is derived from monotonic
clock, for which the following is defined by POSIX:
For this clock, the value returned by clock_gettime() represents
the amount of time (in seconds and nanoseconds) since an unspecified
point in the past.
As as result, overflow protection is needed when comparing two ngx_msec_t.
The change adds such protection to the ngx_quic_detect_lost() function.
|
|
Since recovery_start field was initialized with ngx_current_msec, all
congestion events that happened within the same millisecond or cycle
iteration, were treated as in recovery mode.
Also, when handling persistent congestion, initializing recovery_start
with ngx_current_msec resulted in treating all sent packets as in recovery
mode, which violates RFC 9002, see example in Appendix B.8.
While here, also fixed recovery_start wrap protection. Previously it used
2 * max_idle_timeout time frame for all sent frames, which is not a
reliable protection since max_idle_timeout is unrelated to congestion
control. Now recovery_start <= now condition is enforced. Note that
recovery_start wrap is highly unlikely and can only occur on a
32-bit system if there are no congestion events for 24 days.
|
|
As per RFC 9002, Section B.2, max_datagram_size used in congestion window
computations should be based on path MTU.
|
|
Improved logging for simpler data extraction for plotting congestion
window graphs. In particular, added current milliseconds number from
ngx_current_msec.
While here, simplified logging text and removed irrelevant data.
|
|
Starting with OpenSSL 3.0, groups may be added externally with pluggable
KEM providers. Using SSL_get_negotiated_group(), which makes lookup in a
static table with known groups, doesn't allow to list such groups by names
leaving them in hex. Adding X25519MLKEM768 to the default group list in
OpenSSL 3.5 made this problem more visible. SSL_get0_group_name() and,
apparently, SSL_group_to_name() allow to resolve such provider-implemented
groups, which is also "generally preferred" over SSL_get_negotiated_group()
as documented in OpenSSL git commit 93d4f6133f.
This change makes external groups listing by name using SSL_group_to_name()
available since OpenSSL 3.0. To preserve "prime256v1" naming for the group
0x0017, and to avoid breaking BoringSSL and older OpenSSL versions support,
it is used supplementary for a group that appears to be unknown.
See https://github.com/openssl/openssl/issues/27137 for related discussion.
|
|
Upstream SSL sessions may be of a noticeably larger size with tickets
in TLSv1.2 and older versions, or with "stateless" tickets in TLSv1.3,
if a client certificate is saved into the session. Further, certain
stateless session resumption implemetations may store additional data.
Such one is JDK, known to also include server certificates in session
ticket data, which roughly doubles a decoded session size to slightly
beyond the previous limit. While it's believed to be an issue on the
JDK side, this change allows to save such sessions.
Another, innocent case is using RSA certificates with 8192 key size.
|
|
All such transient buffers are converted to the single storage in BSS.
In preparation to raise the limit.
|
|
|
|
This is consistent with the rest of the code and fixes build on systems
with non-standard definition of struct iovec (Solaris, Illumos).
|
|
|
|
This can happen with certificates and certificate keys specified
with variables due to partial cache update in various scenarios:
- cache expiration with only one element of pair evicted
- on-disk update with non-cacheable encrypted keys
- non-atomic on-disk update
The fix is to retry with fresh data on X509_R_KEY_VALUES_MISMATCH.
|
|
Revalidation is based on file modification time and uniq file index,
and happens after the cache object validity time is expired.
|
|
A new directive "ssl_certificate_cache max=N [valid=time] [inactive=time]"
enables caching of SSL certificate chain and secret key objects specified
by "ssl_certificate" and "ssl_certificate_key" directives with variables.
Co-authored-by: Aleksei Bavshin <a.bavshin@nginx.com>
|
|
SSL object cache, as previously introduced in 1.27.2, did not take
into account encrypted certificate keys that might be unexpectedly
fetched from the cache regardless of the matching passphrase. To
avoid this, caching of encrypted certificate keys is now disabled
based on the passphrase callback invocation.
A notable exception is encrypted certificate keys configured without
ssl_password_file. They are loaded once resulting in the passphrase
prompt on startup and reused in other contexts as applicable.
|
|
Memory based objects are always inherited, engine based objects are
never inherited to adhere the volatile nature of engines, file based
objects are inherited subject to modification time and file index.
The previous behaviour to bypass cache from the old configuration cycle
is preserved with a new directive "ssl_object_cache_inheritable off;".
|
|
While trying to close a stream in ngx_quic_close_streams() by calling its
read event handler, the next stream saved prior to that could be destroyed
recursively. This caused a segfault while trying to access the next stream.
The way the next stream could be destroyed in HTTP/3 is the following.
A request stream read event handler ngx_http_request_handler() could
end up calling ngx_http_v3_send_cancel_stream() to report a cancelled
request stream in the decoder stream. If sending stream cancellation
decoder instruction fails for any reason, and the decoder stream is the
next in order after the request stream, the issue is triggered.
The fix is to postpone calling read event handlers for all streams being
closed to avoid closing a released stream.
|
|
Previously, such packets were treated as long header packets with unknown
version 0, and a version negotiation packet was sent in response. This
could be used to set up an infinite traffic reflect loop with another nginx
instance.
Now version negotiation packets are ignored. As per RFC 9000, Section 6.1:
An endpoint MUST NOT send a Version Negotiation packet in response to
receiving a Version Negotiation packet.
|
|
Since 0-RTT and 1-RTT data exist in the same packet number space,
ngx_quic_discard_ctx incorrectly discards 1-RTT packets when
0-RTT keys are discarded.
The issue was introduced by 58b92177e7c3c50f77f807ab3846ad5c7bbf0ebe.
|
|
|
|
This follows OpenSSL and BoringSSL API, and gives a hint to compiler
that this parameter may not be modified.
|
|
|
|
This simplifies merging protocol values after ea15896 and ebd18ec.
Further, as outlined in ebd18ec18, for libraries preceeding TLSv1.2+
support, only meaningful versions TLSv1 and TLSv1.1 are set by default.
While here, fixed indentation.
|
|
|
|
This change initializes the "err" variable, used to produce a meaningful
diagnostics on error path, to a good safe value.
|
|
Since a2a513b93cae, stream frames no longer need to be retransmitted after it
was deleted. The frames which were retransmitted before, could be stream data
frames sent prior to a RESET_STREAM. Such retransmissions are explicitly
prohibited by RFC 9000, Section 19.4.
|
|
This can potentially provide a large amount of savings,
because CA certificates can be quite large.
Based on previous work by Mini Hawthorne.
|
|
Based on previous work by Mini Hawthorne.
|
|
EVP_KEY objects are a reference-counted container for key material, shallow
copies and OpenSSL stack management aren't needed as with certificates.
Based on previous work by Mini Hawthorne.
|
|
Certificate chains are now loaded once.
The certificate cache provides each chain as a unique stack of reference
counted elements. This shallow copy is required because OpenSSL stacks
aren't reference counted.
Based on previous work by Mini Hawthorne.
|
|
Added ngx_openssl_cache_module, which indexes a type-aware object cache.
It maps an id to a unique instance, and provides references to it, which
are dropped when the cycle's pool is destroyed.
The cache will be used in subsequent patches.
Based on previous work by Mini Hawthorne.
|
|
Instead of cross-linking the objects using exdata, pointers to configured
certificates are now stored in ngx_ssl_t, and OCSP staples are now accessed
with rbtree in it. This allows sharing these objects between SSL contexts.
Based on previous work by Mini Hawthorne.
|
|
|
|
Previously, this used to have extra ngx_explicit_memzero() calls
from within ngx_quic_keys_cleanup(), which might be suboptimal.
|
|
For simplicity, this is done on successful decryption of a 1-RTT packet.
|
|
Previously the last chain field of ngx_quic_buffer_t could still reference freed
chains and buffers after calling ngx_quic_free_buffer(). While normally an
ngx_quic_buffer_t object should not be used after freeing, resetting last_chain
field would prevent a potential use-after-free.
|
|
Sending handshake-level CRYPTO frames after the client's Finished message could
lead to memory disclosure and a potential segfault, if those frames are sent in
one packet with the Finished frame.
|
|
|
|
When loading certificate keys via ENGINE_load_private_key() in runtime,
it was possible to overwrite configuration on ENGINE_by_id() failure.
OpenSSL documention doesn't describe errors in details, the only reason
I found in the comment to example is when the engine is not available.
|
|
The ngx_quic_run() function uses qc->close timer to limit the handshake
duration. Normally it is removed by ngx_quic_do_init_streams() which is
called once when we are done with initial SSL processing.
The problem happens when the client sends early data and streams are
initialized in the ngx_quic_run() -> ngx_quic_handle_datagram() call.
The order of set/remove timer calls is now reversed; the close timer is
set up and the timer fires when assigned, starting the unexpected connection
close process.
The fix is to skip setting the timer if streams were initialized during
handling of the initial datagram. The idle timer for quic is set anyway,
and stream-related timeouts are managed by application layer.
|
|
Stream connection cleanup handler ngx_quic_stream_cleanup_handler() calls
ngx_quic_shutdown_stream() after which it resets the pointer from quic stream
to the connection (sc->connection = NULL). Previously if this call failed,
sc->connection retained the old value, while the connection was freed by the
application code. This resulted later in a second attempt to close the freed
connection, which lead to allocator double free error.
The fix is to reset the sc->connection pointer in case of error.
|
|
Inspired by RFC 9001, Section 6.3, trial packet decryption with the current
keys is now used to avoid a timing side-channel signal. Further, this fixes
segfault while accessing missing next keys (ticket #2585).
|
|
Previously if an MTU probe send failed early in ngx_quic_frame_sendto()
due to allocation error or congestion control, the application level packet
number was not increased, but was still saved as MTU probe packet number.
Later when a packet with this number was acknowledged, the unsent MTU probe
was acknowledged as well. This could result in discovering a bigger MTU than
supported by the path, which could lead to EMSGSIZE (Message too long) errors
while sending further packets.
The problem existed since PMTUD was introduced in 58afcd72446f (1.25.2).
Back then only the unlikely memory allocation error could trigger it. However
in efcdaa66df2e congestion control was added to ngx_quic_frame_sendto() which
can now trigger the issue with a higher probability.
|