nginx.git/src/http/ngx_http_file_cache.c, branch release-1.29.4

Fixed request termination with AIO and subrequests (ticket #2555).

2024-01-30T00:20:05+00:00

When a request was terminated due to an error via ngx_http_terminate_request()
while an AIO operation was running in a subrequest, various issues were
observed.  This happened because ngx_http_request_finalizer() was only set
in the subrequest where ngx_http_terminate_request() was called, but not
in the subrequest where the AIO operation was running.  After completion
of the AIO operation normal processing of the subrequest was resumed, leading
to issues.

In particular, in case of the upstream module, termination of the request
called upstream cleanup, which closed the upstream connection.  Attempts to
further work with the upstream connection after AIO operation completion
resulted in segfaults in ngx_ssl_recv(), "readv() failed (9: Bad file
descriptor) while reading upstream" errors, or socket leaks.

In ticket #2555, issues were observed with the following configuration
with cache background update (with thread writing instrumented to
introduce a delay, when a client closes the connection during an update):

    location = /background-and-aio-write {
        proxy_pass ...
        proxy_cache one;
        proxy_cache_valid 200 1s;
        proxy_cache_background_update on;
        proxy_cache_use_stale updating;
        aio threads;
        aio_write on;
        limit_rate 1000;
    }

Similarly, the same issue can be seen with SSI, and can be caused by
errors in subrequests, such as in the following configuration
(where "/proxy" uses AIO, and "/sleep" returns 444 after some delay,
causing request termination):

    location = /ssi-active-boom {
        ssi on;
        ssi_types *;
        return 200 '
                   
                   
                   ';
        limit_rate 1000;
    }

Or the same with both AIO operation and the error in non-active subrequests
(which needs slightly different handling, see below):

    location = /ssi-non-active-boom {
        ssi on;
        ssi_types *;
        return 200 '
                   
                   
                   
                   ';
        limit_rate 1000;
    }

Similarly, issues can be observed with just static files.  However,
with static files potential impact is limited due to timeout safeguards
in ngx_http_writer(), and the fact that c->error is set during request
termination.

In a simple configuration with an AIO operation in the active subrequest,
such as in the following configuration, the connection is closed right
after completion of the AIO operation anyway, since ngx_http_writer()
tries to write to the connection and fails due to c->error set:

    location = /ssi-active-static-boom {
        ssi on;
        ssi_types *;
        return 200 '
                   
                   
                   ';
        limit_rate 1000;
    }

In the following configuration, with an AIO operation in a non-active
subrequest, the connection is closed only after send_timeout expires:

    location = /ssi-non-active-static-boom {
        ssi on;
        ssi_types *;
        return 200 '
                   
                   
                   
                   ';
        limit_rate 1000;
    }

Fix is to introduce r->main->terminated flag, which is to be checked
by AIO event handlers when the r->main->blocked counter is decremented.
When the flag is set, handlers are expected to wake up the connection
instead of the subrequest (which might be already cleaned up).

Additionally, now ngx_http_request_finalizer() is always set in the
active subrequest, so waking up the connection properly finalizes the
request even if termination happened in a non-active subrequest.

AIO operations now add timers (ticket #2162).

2024-01-29T07:31:37+00:00

Each AIO (thread IO) operation being run is now accompanied with 1-minute
timer.  This timer prevents unexpected shutdown of the worker process while
an AIO operation is running, and logs an alert if the operation is running
for too long.

This fixes "open socket left" alerts during worker processes shutdown
due to pending AIO (or thread IO) operations while corresponding requests
have no timers.  In particular, such errors were observed while reading
cache headers (ticket #2162), and with worker_shutdown_timeout.

Fixed "zero size buf" alerts with subrequests.

2023-01-28T02:23:33+00:00

Since 4611:2b6cb7528409 responses from the gzip static, flv, and mp4 modules
can be used with subrequests, though empty files were not properly handled.
Empty gzipped, flv, and mp4 files thus resulted in "zero size buf in output"
alerts.  While valid corresponding files are not expected to be empty, such
files shouldn't result in alerts.

Fix is to set b->sync on such empty subrequest responses, similarly to what
ngx_http_send_special() does.

Additionally, the static module, the ngx_http_send_response() function, and
file cache are modified to do the same instead of not sending the response
body at all in such cases, since not sending the response body at all is
believed to be at least questionable, and might break various filters
which do not expect such behaviour.

Style.

2023-01-28T02:20:23+00:00

Cache: fixed race in ngx_http_file_cache_forced_expire().

2022-02-01T13:29:28+00:00

During configuration reload two cache managers might exist for a short
time.  If both tried to delete the same cache node, the "ignore long locked
inactive cache entry" alert appeared in logs.  Additionally,
ngx_http_file_cache_forced_expire() might be also called by worker
processes, with similar results.

Fix is to ignore cache nodes being deleted, similarly to how it is
done in ngx_http_file_cache_expire() since 3755:76e3a93821b1.  This
was somehow missed in 7002:ab199f0eb8e8, when ignoring long locked
cache entries was introduced in ngx_http_file_cache_forced_expire().

Cache: keep c->body_start when Vary changes (ticket #2029).

2020-09-09T16:26:27+00:00

If the variant hash doesn't match one we used as a secondary cache key,
we switch back to the original key.  In this case, c->body_start was kept
updated from an existing cache node overwriting the new response value.
After file cache update, it led to discrepancy between a cache node and
cache file seen as critical errors "file cache .. has too long header".

Cache: reset c->body_start when reading a variant on Vary mismatch.

2017-08-04T16:37:37+00:00

Previously, a variant not present in shared memory and stored on disk using a
secondary key was read using c->body_start from a variant stored with a main
key.  This could result in critical errors "cache file .. has too long header".

Cache: introduced min_free cache clearing.

2020-06-22T15:03:00+00:00

Clearing cache based on free space left on a file system is
expected to allow better disk utilization in some cases, notably
when disk space might be also used for something other than nginx
cache (including nginx own temporary files) and while loading
cache (when cache size might be inaccurate for a while, effectively
disabling max_size cache clearing).

Based on a patch by Adam Bambuch.

Cache: improved keys zone size error reporting.

2018-10-31T13:49:40+00:00

After this change, too small keys zones are explicitly reported as such,
much like in the other modules which use shared memory.

Cache: fixed minimum cache keys zone size limit.

2018-10-31T13:49:39+00:00

Size of a shared memory zones must be at least two pages - one page
for slab allocator internal data, and another page for actual allocations.
Using 8192 instead is wrong, as there are systems with page sizes other
than 4096.

Note well that two pages is usually too low as well.  In particular, cache
is likely to use two allocations of different sizes for global structures,
and at least four pages will be needed to properly allocate cache nodes.
Except in a few very special cases, with keys zone of just two pages nginx
won't be able to start.  Other uses of shared memory impose a limit
of 8 pages, which provides some room for global allocations.  This patch
doesn't try to address this though.

Inspired by ticket #1665.