summaryrefslogtreecommitdiffhomepage
path: root/src/os/unix (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-05-30Fixed runtime handling of systems without EPOLLRDHUP support.Marcus Ball2-2/+6
In 7583:efd71d49bde0 (nginx 1.17.5) along with introduction of the ioctl(FIONREAD) support proper handling of systems without EPOLLRDHUP support in the kernel (but with EPOLLRDHUP in headers) was broken. Before the change, rev->available was never set to 0 unless ngx_use_epoll_rdhup was also set (that is, runtime test for EPOLLRDHUP introduced in 6536:f7849bfb6d21 succeeded). After the change, rev->available might reach 0 on systems without runtime EPOLLRDHUP support, stopping further reading in ngx_readv_chain() and ngx_unix_recv(). And, if EOF happened to be already reported along with the last event, it is not reported again by epoll_wait(), leading to connection hangs and timeouts on such systems. This affects Linux kernels before 2.6.17 if nginx was compiled with newer headers, and, more importantly, emulation layers, such as DigitalOcean's App Platform's / gVisor's epoll emulation layer. Fix is to explicitly check ngx_use_epoll_rdhup before the corresponding rev->pending_eof tests in ngx_readv_chain() and ngx_unix_recv().
2022-01-26Core: added autotest for UDP segmentation offloading.Vladimir Homutov1-0/+4
2022-01-25Core: added function for local source address cmsg.Vladimir Homutov1-0/+65
2022-01-25Core: made the ngx_sendmsg() function non-static.Vladimir Homutov1-70/+95
The NGX_HAVE_ADDRINFO_CMSG macro is defined when at least one of methods to deal with corresponding control message is available.
2021-12-27Support for sendfile(SF_NOCACHE).Maxim Dounin1-0/+6
The SF_NOCACHE flag, introduced in FreeBSD 11 along with the new non-blocking sendfile() implementation by glebius@, makes it possible to use sendfile() along with the "directio" directive.
2021-12-27Simplified sendfile(SF_NODISKIO) usage.Maxim Dounin1-46/+28
Starting with FreeBSD 11, there is no need to use AIO operations to preload data into cache for sendfile(SF_NODISKIO) to work. Instead, sendfile() handles non-blocking loading data from disk by itself. It still can, however, return EBUSY if a page is already being loaded (for example, by a different process). If this happens, we now post an event for the next event loop iteration, so sendfile() is retried "after a short period", as manpage recommends. The limit of the number of EBUSY tolerated without any progress is preserved, but now it does not result in an alert, since on an idle system event loop iteration might be very short and EBUSY can happen many times in a row. Instead, SF_NODISKIO is simply disabled for one call once the limit is reached. With this change, sendfile(SF_NODISKIO) is now used automatically as long as sendfile() is enabled, and no longer requires "aio on;".
2021-11-25HTTP/2: fixed "task already active" with sendfile in threads.Maxim Dounin2-22/+0
With sendfile in threads, "task already active" alerts might appear in logs if a write event happens on the main HTTP/2 connection, triggering a sendfile in threads while another thread operation is already running. Observed with "aio threads; aio_write on; sendfile on;" and with thread event handlers modified to post a write event to the main HTTP/2 connection (though can happen without any modifications). Similarly, sendfile() with AIO preloading on FreeBSD can trigger duplicate aio operation, resulting in "second aio post" alerts. This is, however, harder to reproduce, especially on modern FreeBSD systems, since sendfile() usually does not return EBUSY. Fix is to avoid starting a sendfile operation if other thread operation is active by checking r->aio in the thread handler (and, similarly, in aio preload handler). The added check also makes duplicate calls protection redundant, so it is removed.
2021-10-29Fixed sendfile() limit handling on Linux.Maxim Dounin1-1/+3
On Linux starting with 2.6.16, sendfile() silently limits all operations to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly triggered the interrupt check, and resulted in 0-sized writev() on the next loop iteration. Fix is to make sure the limit is always checked, so we will return from the loop if the limit is already reached even if number of bytes sent is not exactly equal to the number of bytes we've tried to send.
2021-08-30Give GCC atomics precedence over deprecated Darwin atomic(3).Sergey Kandaurov1-33/+33
This allows to build nginx on macOS with -Wdeprecated-declarations.
2021-07-05Use only preallocated memory in ngx_readv_chain() (ticket #1408).Ruslan Ermilov1-1/+1
In d1bde5c3c5d2, the number of preallocated iovec's for ngx_readv_chain() was increased. Still, in some setups, the function might allocate memory for iovec's from a connection pool, which is only freed when closing the connection. The ngx_readv_chain() function was modified to use only preallocated memory, similarly to the ngx_writev_chain() change in 8e903522c17a.
2021-04-22Restored zeroing of ngx_channel_t in ngx_pass_open_channel().Ruslan Ermilov1-0/+2
Due to structure's alignment, some uninitialized memory contents may have been passed between processes. Zeroing was removed in 0215ec9aaa8a. Reported by Johnny Wang.
2021-03-11Removed "ch" argument from ngx_pass_open_channel().Ruslan Ermilov1-39/+18
2021-03-01Introduced strerrordesc_np() support.Maxim Dounin1-1/+45
The strerrordesc_np() function, introduced in glibc 2.32, provides an async-signal-safe way to obtain error messages. This makes it possible to avoid copying error messages.
2021-03-01Improved maximum errno detection.Maxim Dounin1-6/+85
Previously, systems without sys_nerr (or _sys_nerr) were handled with an assumption that errors start at 0 and continuous. This is, however, not something POSIX requires, and not true on some platforms. Notably, on Linux, where sys_nerr is no longer available for newly linked binaries starting with glibc 2.32, there are gaps in error list, which used to stop us from properly detecting maximum errno. Further, on GNU/Hurd errors start at 0x40000001. With this change, maximum errno detection is moved to the runtime code, now able to ignore gaps, and also detects the first error if needed. This fixes observed "Unknown error" messages as seen on Linux with glibc 2.32 and on GNU/Hurd.
2020-06-22Cache: introduced min_free cache clearing.Maxim Dounin2-0/+34
Clearing cache based on free space left on a file system is expected to allow better disk utilization in some cases, notably when disk space might be also used for something other than nginx cache (including nginx own temporary files) and while loading cache (when cache size might be inaccurate for a while, effectively disabling max_size cache clearing). Based on a patch by Adam Bambuch.
2020-06-22Too large st_blocks values are now ignored (ticket #157).Maxim Dounin1-1/+4
With XFS, using "allocsize=64m" mount option results in large preallocation being reported in the st_blocks as returned by fstat() till the file is closed. This in turn results in incorrect cache size calculations and wrong clearing based on max_size. To avoid too aggressive cache clearing on such volumes, st_blocks values which result in sizes larger than st_size and eight blocks (an arbitrary limit) are no longer trusted, and we use st_size instead. The ngx_de_fs_size() counterpart is intentionally not modified, as it is used on closed files and hence not affected by this problem.
2020-06-22Large block sizes on Linux are now ignored (ticket #1168).Maxim Dounin1-0/+12
NFS on Linux is known to report wsize as a block size (in both f_bsize and f_frsize, both in statfs() and statvfs()). On the other hand, typical file system block sizes on Linux (ext2/ext3/ext4, XFS) are limited to pagesize. (With FAT, block sizes can be at least up to 512k in extreme cases, but this doesn't really matter, see below.) To avoid too aggressive cache clearing on NFS volumes on Linux, block sizes larger than pagesize are now ignored. Note that it is safe to ignore large block sizes. Since 3899:e7cd13b7f759 (1.0.1) cache size is calculated based on fstat() st_blocks, and rounding to file system block size is preserved mostly for Windows. Note well that on other OSes valid block sizes seen are at least up to 65536. In particular, UFS on FreeBSD is known to work well with block and fragment sizes set to 65536.
2020-06-08Stream: fixed processing of zero length UDP packets (ticket #1982).Vladimir Homutov1-0/+7
2020-06-01Fixed SIGQUIT not removing listening UNIX sockets (closes #753).Ruslan Ermilov1-12/+2
Listening UNIX sockets were not removed on graceful shutdown, preventing the next runs. The fix is to replace the custom socket closing code in ngx_master_process_cycle() by the ngx_close_listening_sockets() call.
2019-10-17Events: available bytes calculation via ioctl(FIONREAD).Maxim Dounin3-2/+77
This makes it possible to avoid looping for a long time while working with a fast enough peer when data are added to the socket buffer faster than we are able to read and process them (ticket #1431). This is basically what we already do on FreeBSD with kqueue, where information about the number of bytes in the socket buffer is returned by the kevent() call. With other event methods rev->available is now set to -1 when the socket is ready for reading. Later in ngx_recv() and ngx_recv_chain(), if full buffer is received, real number of bytes in the socket buffer is retrieved using ioctl(FIONREAD). Reading more than this number of bytes ensures that even with edge-triggered event methods the event will be triggered again, so it is safe to stop processing of the socket and switch to other connections. Using ioctl(FIONREAD) only after reading a full buffer is an optimization. With this approach we only call ioctl(FIONREAD) when there are at least two recv()/readv() calls.
2019-01-28Fixed portability issues with union sigval.Sergey Kandaurov2-1/+7
AIO support in nginx was originally developed against FreeBSD versions 4-6, where the sival_ptr field was named as sigval_ptr (seemingly by mistake[1]), which made nginx use the only name available then. The standard-complaint name was restored in 2005 (first appeared in FreeBSD 7.0, 2008), retaining compatibility with previous versions[2][3]. In DragonFly, similar changes were committed in 2009[4], with backward compatibility recently removed[5]. The change switches to the standard name, retaining compatibility with old FreeBSD versions. [1] https://svnweb.freebsd.org/changeset/base/48621 [2] https://svnweb.freebsd.org/changeset/base/152029 [3] https://svnweb.freebsd.org/changeset/base/174003 [4] https://gitweb.dragonflybsd.org/dragonfly.git/commit/3693401 [5] https://gitweb.dragonflybsd.org/dragonfly.git/commit/7875042
2018-12-24Win32: removed NGX_DIR_MASK concept.Maxim Dounin1-3/+0
Previous interface of ngx_open_dir() assumed that passed directory name has a room for NGX_DIR_MASK at the end (NGX_DIR_MASK_LEN bytes). While all direct users of ngx_dir_open() followed this interface, this also implied similar requirements for indirect uses - in particular, via ngx_walk_tree(). Currently none of ngx_walk_tree() uses provides appropriate space, and fixing this does not look like a right way to go. Instead, ngx_dir_open() interface was changed to not require any additional space and use appropriate allocations instead.
2018-07-22Fixed NGX_TID_T_FMT format specification for uint64_t.Maxim Dounin1-2/+2
Previously, "%uA" was used, which corresponds to ngx_atomic_uint_t. Size of ngx_atomic_uint_t can be easily different from uint64_t, leading to undefined results.
2018-05-23Removed glibc crypt_r() bug workaround (ticket #1469).Maxim Dounin1-4/+0
The bug in question was fixed in glibc 2.3.2 and is no longer expected to manifest itself on real servers. On the other hand, the workaround causes compilation problems on various systems. Previously, we've already fixed the code to compile with musl libc (fd6fd02f6a4d), and now it is broken on Fedora 28 where glibc's crypt library was replaced by libxcrypt. So the workaround was removed.
2018-03-19Fixed checking ngx_tcp_push() and ngx_tcp_nopush() return values.Ruslan Ermilov2-2/+2
No functional changes.
2017-12-19Fixed capabilities version.Roman Arutyunyan1-1/+1
Previously, capset(2) was called with the 64-bit capabilities version _LINUX_CAPABILITY_VERSION_3. With this version Linux kernel expected two copies of struct __user_cap_data_struct, while only one was submitted. As a result, random stack memory was accessed and random capabilities were requested by the worker. This sometimes caused capset() errors. Now the 32-bit version _LINUX_CAPABILITY_VERSION_1 is used instead. This is OK since CAP_NET_RAW is a 32-bit capability (CAP_NET_RAW = 13).
2017-12-18Improved the capabilities feature detection.Roman Arutyunyan2-2/+2
Previously included file sys/capability.h mentioned in capset(2) man page, belongs to the libcap-dev package, which may not be installed on some Linux systems when compiling nginx. This prevented the capabilities feature from being detected and compiled on that systems. Now linux/capability.h system header is included instead. Since capset() declaration is located in sys/capability.h, now capset() syscall is defined explicitly in code using the SYS_capset constant, similarly to other Linux-specific features in nginx.
2017-12-13Retain CAP_NET_RAW capability for transparent proxying.Roman Arutyunyan2-0/+37
The capability is retained automatically in unprivileged worker processes after changing UID if transparent proxying is enabled at least once in nginx configuration. The feature is only available in Linux.
2017-12-11Use sysconf to determine cacheline size at runtime.Debayan Ghosh1-0/+10
Determine cacheline size at runtime if supported using sysconf(_SC_LEVEL1_DCACHE_LINESIZE). In case not supported, fallback to compile time defaults.
2017-11-28Removed unused FreeBSD-specific definitions in ngx_posix_config.h.Sergey Kandaurov1-20/+0
2017-11-28Fixed "changing binary" when reaper is not init.Ruslan Ermilov4-3/+8
On some systems, it's possible that reaper of orphaned processes is set to something other than "init" process. On such systems, the changing binary procedure did not work. The fix is to check if PPID has changed, instead of assuming it's always 1 for orphaned processes.
2017-09-18Removed more remnants of the old pthread implementation.Ruslan Ermilov1-10/+0
After e284f3ff6831, ngx_crypt() can no longer return NGX_AGAIN.
2017-08-09Style.Sergey Kandaurov1-0/+1
2017-06-01Style.Maxim Dounin1-3/+3
2017-04-27Added missing "fall through" comments (ticket #1259).Maxim Dounin1-0/+1
Found by gcc7 (-Wimplicit-fallthrough).
2017-04-20Core: signal sender pid logging.Igor Sysoev1-8/+24
2017-04-11Set UDP datagram source address (ticket #1239).Roman Arutyunyan1-0/+90
Previously, the source IP address of a response UDP datagram could differ from the original datagram destination address. This could happen if the server UDP socket is bound to a wildcard address and the network interface chosen to output the response packet has a different default address than the destination address of the original packet. For example, if two addresses from the same network are configured on an interface. Now source address is set explicitly if a response is sent for a server UDP socket bound to a wildcard address.
2017-04-17Enabled IPV6_RECVPKTINFO / IPV6_PKTINFO on macOS.Sergey Kandaurov1-0/+3
This change allows setting the destination IPv6 address of a UDP datagram received on a wildcard socket.
2017-03-28Simplified and improved sendfile() code on Linux.Maxim Dounin1-67/+47
The ngx_linux_sendfile() function is now used for both normal sendfile() and sendfile in threads. The ngx_linux_sendfile_thread() function was modified to use the same interface as ngx_linux_sendfile(), and is simply called from ngx_linux_sendfile() when threads are enabled. Special return code NGX_DONE is used to indicate that a thread task was posted and no further actions are needed. If number of bytes sent is less that what we were sending, we now always retry sending. This is needed for sendfile() in threads as the number of bytes we are sending might have been changed since the thread task was posted. And this is also needed for Linux 4.3+, as sendfile() might be interrupted at any time and provides no indication if it was interrupted or not (ticket #1174).
2017-03-16Added missing "static" specifier found by gcc -Wtraditional.Ruslan Ermilov1-1/+1
This has somehow escaped from fbdaad9b0e7b.
2017-03-07Style.Maxim Dounin1-2/+0
2017-03-07Introduced worker_shutdown_timeout.Maxim Dounin1-0/+1
The directive configures a timeout to be used when gracefully shutting down worker processes. When the timer expires, nginx will try to close all the connections currently open to facilitate shutdown.
2017-03-07Cancelable timers are now preserved if there are other timers.Maxim Dounin1-4/+1
There is no need to cancel timers early if there are other timers blocking shutdown anyway. Preserving such timers allows nginx to continue some periodic work till the shutdown is actually possible. With the new approach, timers with ev->cancelable are simply ignored when checking if there are any timers left during shutdown.
2017-01-20Removed pthread mutex / conditional variables debug messages.Maxim Dounin2-20/+0
These messages doesn't seem to be needed in practice and only make debugging logs harder to read.
2017-01-20Fixed trailer construction with limit on FreeBSD and macOS.Maxim Dounin2-7/+15
The ngx_chain_coalesce_file() function may produce more bytes to send then requested in the limit passed, as it aligns the last file position to send to memory page boundary. As a result, (limit - send) may become negative. This resulted in big positive number when converted to size_t while calling ngx_output_chain_to_iovec(). Another part of the problem is in ngx_chain_coalesce_file(): it changes cl to the next chain link even if the current buffer is only partially sent due to limit. Therefore, if a file buffer was not expected to be fully sent due to limit, and was followed by a memory buffer, nginx called sendfile() with a part of the file buffer, and the memory buffer in trailer. If there were enough room in the socket buffer, this resulted in a part of the file buffer being skipped, and corresponding part of the memory buffer sent instead. The bug was introduced in 8e903522c17a (1.7.8). Configurations affected are ones using limits, that is, limit_rate and/or sendfile_max_chunk, and memory buffers after file ones (may happen when using subrequests or with proxying with disk buffering). Fix is to explicitly check if (send < limit) before constructing trailer with ngx_output_chain_to_iovec(). Additionally, ngx_chain_coalesce_file() was modified to preserve unfinished file buffers in cl.
2016-10-05Cache: cache manager limits.Dmitry Volyntsev1-3/+3
The new parameters "manager_files", "manager_sleep" and "manager_threshold" were added to proxy_cache_path and friends. Note that ngx_path_manager_pt was changed to return ngx_msec_t instead of time_t (API change).
2016-09-15Stream: filters.Roman Arutyunyan7-0/+253
2016-08-04Always seed PRNG with PID, seconds, and milliseconds.Ruslan Ermilov2-3/+7
2016-08-04Fixed undefined behavior when left shifting signed integer.Ruslan Ermilov1-1/+1
2016-06-08Fixed spelling.Otto Kekäläinen1-1/+1