unit.git/src, branch 1.29.1

Set a safer umask(2) when running as a daemon.

2023-02-23T12:01:14+00:00

When running as a daemon. unit currently sets umask(0), i.e no umask.
This is resulting in various directories being created with a mode of
0777, e.g

  rwxrwxrwx

this is currently affecting cgroup and rootfs directories, which are
being created with a mode of 0777, and when running as a daemon as there
is no umask to restrict the permissions.

This also affects the language modules (the umask is inherited over
fork(2)) whereby unless something explicitly sets a umask, files and
directories will be created with full permissions, 0666 (rw-rw-rw-)/
0777 (rwxrwxrwx) respectively.

This could be an unwitting security issue.

My original idea was to just remove the umask(0) call and thus inherit
the umask from the executing shell/program.

However there was some concern about just inheriting whatever umask was
in effect.

Alex suggested that rather than simply removing the umask(0) call we
change it to a value of 022 (which is a common default), which will
result in directories and files with permissions at most of 0755
(rwxr-xr-x) & 0644 (rw-r--r--).

If applications need some other umask set, they can (as they always have
been able to) set their own umask(2).

Suggested-by: Alejandro Colomar 
Reviewed-by: Liam Crilly 
Signed-off-by: Andrew Clayton

Isolation: rootfs: Set the sticky bit on the tmp directory.

2023-02-22T16:04:53+00:00

When using the 'rootfs' isolation option, by default a tmpfs filesystem
is mounted on tmp/. Currently this is mounted with a mode of 0777, i.e

  drwxrwxrwx.   3 root   root   60 Feb 22 11:56 tmp

however this should really have the sticky bit[0] set (as is per-normal for
such directories) to prevent users from having free reign on the files
contained within.

What we really want is it mounted with a mode of 01777, i.e

  drwxrwxrwt.   3 root   root   60 Feb 22 11:57 tmp

[0]: To quote inode(7)

 "The sticky bit (S_ISVTX) on a directory means that a file in that
  directory can be renamed or deleted only by the owner of the file, by
  the owner of the directory, and by a privileged process."

Reviewed-by: Liam Crilly 
Signed-off-by: Andrew Clayton

Remove the nxt_getpid() alias.

2022-12-01T21:05:39+00:00

Since the previous commit, nxt_getpid() is only ever aliased to
getpid(2).

nxt_getpid() was only used once in the code, while there are multiple
direct uses of getpid(2)

  $ grep -r "getpid()" src/
  src/nxt_unit.c:    nxt_unit_pid = getpid();
  src/nxt_process.c:    nxt_pid = nxt_getpid();
  src/nxt_process.c:    nxt_pid = getpid();
  src/nxt_lib.c:    nxt_pid = getpid();
  src/nxt_process.h:#define nxt_getpid()                                                          \
  src/nxt_process.h:#define nxt_getpid()                                                          \
  src/nxt_process.h:    getpid()

Just remove it and convert the _single_ instance of nxt_getpid() to
getpid(2).

Reviewed-by: Alejandro Colomar 
Signed-off-by: Andrew Clayton

Isolation: Remove the syscall(SYS_getpid) wrapper.

2022-11-19T23:58:51+00:00

When using SYS_clone we used the getpid kernel system call directly via
syscall(SYS_getpid) to avoid issues with cached pids.

However since we are now only using fork(2) (+ unshare(2) for
namespaces) we no longer need to call the kernel getpid directly as the
fork(2) will ensure the cached pid is invalidated.

Reviewed-by: Alejandro Colomar 
Signed-off-by: Andrew Clayton

Isolation: Remove nxt_clone().

2022-11-19T02:27:22+00:00

Since the previous commit, this is no longer used.

Reviewed-by: Alejandro Colomar 
Signed-off-by: Andrew Clayton

Isolation: Switch to fork(2) & unshare(2) on Linux.

2022-11-18T23:53:30+00:00

On GitHub, @razvanphp & @hbernaciak both reported issues running the
APCu PHP module under Unit.

When using this module they were seeing errors like

  'apcu_fetch(): Failed to acquire read lock'

However when running APCu under php-fpm, everything was fine.

The issue turned out to be due to our use of SYS_clone breaking the
pthreads(7) API used by APCu.  Even if we had been using glibc's
clone(2) wrapper we would still have run into problems due to a known
issue there.

Essentially the problem is when using clone, glibc doesn't update the
TID cache, so the child ends up having the same TID as the parent and
that is used in various parts of pthreads(7) such as in the various
locking primitives, so when APCu was grabbing a lock it ended up using
the TID of the main unit process (rather than that of the php
application processes that was grabbing the lock).

So due to the above what was happening was when one of the application
processes went to grab either a read or write lock, the lock was
actually being attributed to the main unit process.  If a process had
acquired the write lock, then if a process tried to acquire a read or
write lock then glibc would return EDEADLK due to detecting a deadlock
situation due to thinking the process already held the write lock when
in fact it didn't.

It seems the right way to do this is via fork(2) and unshare(2).  We
already use fork(2) on other platforms.

This requires a few tricks to keep the essence of the processes the same
as before when using clone

  1) We use the prctl(2) PR_SET_CHILD_SUBREAPER option (if its
     available, since Linux 3.4) to make the main unit process inherit
     prototype processes after a double fork(2), rather than them being
     reparented to 'init'.

     This avoids needing to ^C twice to fully exit unit when running in
     the foreground.  It's probably also better if they maintain their
     parent child relationship where possible.

  2) We use a double fork(2) technique on the prototype processes to
     ensure they themselves end up in a new PID namespace as PID 1 (when
     CLONE_NEWPID is being used).

     When using unshare(CLONE_NEWPID), the calling process is _not_
     placed in the namespace (as discussed in pid_namespaces(7)).  It
     only sets things up so that subsequent children are placed in a PID
     namespace.

     Having the prototype processes as PID 1 in the new PID namespace is
     probably a good thing and matches the behaviour of clone(2).  Also,
     some isolation tests break if the prototype process is not PID 1.

  3) Due to the above double fork(2) the main unit process looses track
     of the prototype process ID, which it needs to know.

     To solve this, we employ a simple pipe(2) between the main unit and
     prototype processes and pass the prototype grandchild PID from the
     parent of the second fork(2) before exiting.  This needs to be done
     from the parent and not the grandchild, as the grandchild will see
     itself having a PID of 1 while the main process needs its
     externally visible PID.

Link: 
Link: 
Closes: 
Reviewed-by: Alejandro Colomar 
Signed-off-by: Andrew Clayton

Isolation: Rename NXT_HAVE_CLONE -> NXT_HAVE_LINUX_NS.

2022-11-18T23:42:44+00:00

Due to the need to replace our use of clone/__NR_clone on Linux with
fork(2)/unshare(2) for enabling Linux namespaces(7) to keep the
pthreads(7) API working.  Let's rename NXT_HAVE_CLONE to
NXT_HAVE_LINUX_NS, i.e name it after the feature, not how it's
implemented, then in future if we change how we do namespaces again we
don't have to rename this.

Reviewed-by: Alejandro Colomar 
Signed-off-by: Andrew Clayton

NJS: adding the missing vm destruction.

2023-01-30T03:16:01+00:00

This commit fixed the njs memory leak happened in the config validation, updating and http requests.

Python: ASGI: Don't log asyncio.get_running_loop() errors.

2023-02-07T13:11:10+00:00

This adds a check to nxt_python_asgi_get_event_loop() on the
event_loop_func name in the case that running that function fails, and
if it's get_running_loop() that failed we skip printing an error message
as this is an often expected behaviour since the previous commit and we
don't want users reporting erroneous bugs.

This check will always happen regardless of Python version while it
really only applies to Python >= 3.7, there didn't seem much point
adding complexity to the code for this case and in what will be an ever
diminishing case of people running older Pythons.

Reviewed-by: Alejandro Colomar 
Signed-off-by: Andrew Clayton

Python: ASGI: Switch away from asyncio.get_event_loop().

2023-01-20T03:33:37+00:00

Several users on GitHub reported issues with running Python ASGI apps on
Unit with Python 3.11.1 (this would also effect Python 3.10.9) with the
following error from Unit

  2023/01/15 22:43:22 [alert] 0#77128 [unit] Python failed to call 'asyncio.get_event_loop'

TL;DR

asyncio.get_event_loop() is currently broken due to the process of
deprecating part or all of it.

First some history.

In Unit we had this commit

  commit 8dcb0b9987033d0349a6ecf528014a9daa574787
  Author: Max Romanov 
  Date:   Thu Nov 5 00:04:59 2020 +0300

      Python: request processing in multiple threads.

One of things this did was to create a new asyncio event loop in each
thread using asyncio.new_event_loop().

It's perhaps worth noting that all these asyncio.* functions are Python
functions that we call from the C code in Unit.

Then we had this commit

  commit f27fbd9b4d2bdaddf1e7001d0d0bc5586ba04cd4
  Author: Max Romanov 
  Date:   Tue Jul 20 10:37:54 2021 +0300

      Python: using default event_loop for main thread for ASGI.

This changed things so that Unit calls asyncio.get_event_loop() in the
_main_ thread (but still calls asyncio.new_event_loop() in the other
threads).

asyncio.get_event_loop() up until recently would either return an
already running event loop or return a newly created one.

This was done for $reasons that the commit message and GitHub issue #560
hint at. But the intimation is that there can already be an event loop
running from the application (I assume it's referring to the users
application) at this point and if there is we should use it.

Now for the Python side of things.

On the main branch we had

  commit 172c0f2752d8708b6dda7b42e6c5a3519420a4e8
  Author: Serhiy Storchaka 
  Date:   Sun Apr 25 13:40:44 2021 +0300

      bpo-39529: Deprecate creating new event loop in asyncio.get_event_loop() (GH-23554)

This commit began the deprecating of asyncio.get_event_loop().

  commit fd38a2f0ec03b4eec5e3cfd41241d198b1ee555a
  Author: Serhiy Storchaka 
  Date:   Tue Dec 6 19:42:12 2022 +0200

      gh-93453: No longer create an event loop in get_event_loop() (#98440)

This turned asyncio.get_event_loop() into a RuntimeError _if_ there
isn't a current event loop.

  commit e5bd5ad70d9e549eeb80aadb4f3ccb0f2f23266d
  Author: Serhiy Storchaka 
  Date:   Fri Jan 13 14:40:29 2023 +0200

      gh-100160: Restore and deprecate implicit creation of an event loop (GH-100410)

This re-creates the event loop if there wasn't one and emits a
deprecation warning.

After at least the last two commits Unit no longer works with the Python
_main_ branch.

Meanwhile on the 3.11 branch we had

  commit 3fae04b10e2655a20a3aadb5e0d63e87206d0c67
  Author: Serhiy Storchaka 
  Date:   Tue Dec 6 17:15:44 2022 +0200

      [3.11] gh-93453: Only emit deprecation warning in asyncio.get_event_loop when a new event loop is created (#99949)

which is what caused our breakage, though perhaps unintentionally as we
get the following traceback

  Traceback (most recent call last):
    File "/usr/lib64/python3.11/asyncio/events.py", line 676, in get_event_loop
      f = sys._getframe(1)
          ^^^^^^^^^^^^^^^^
  ValueError: call stack is not deep enough
  2023/01/18 02:46:10 [alert] 0#180279 [unit] Python failed to call 'asyncio.get_event_loop'

However, regardless, it is clear we need to stop using
asyncio.get_event_loop().

One option is to switch to the higher level asyncio.run() API, however
that is a rather large change.

This commit takes the simpler approach of using
asyncio.get_running_loop() (which it seems get_event_loop() will
eventually be an alias of) in the _main_ thread to return the currently
running event loop, or if there is no current event loop, it will call
asyncio.new_event_loop() to return a newly created event loop.

I believe this mimics the current behaviour. In my testing
get_event_loop() seemed to always return a newly created loop, as when
just calling get_running_loop() it would return NULL and we would fail
out.

When running two processes each with 2 threads we would get the
following loops with Python 3.11.0 and unpatched Unit

  <_UnixSelectorEventLoop running=False closed=False debug=False>
  <_UnixSelectorEventLoop running=False closed=False debug=False>
  <_UnixSelectorEventLoop running=False closed=False debug=False>
  <_UnixSelectorEventLoop running=False closed=False debug=False>

and with Python 3.11.1 and a patched Unit we would get

  <_UnixSelectorEventLoop running=False closed=False debug=False>
  <_UnixSelectorEventLoop running=False closed=False debug=False>
  <_UnixSelectorEventLoop running=False closed=False debug=False>
  <_UnixSelectorEventLoop running=False closed=False debug=False>

Tested-by: Rafał Safin 
Reviewed-by: Alejandro Colomar 
Signed-off-by: Andrew Clayton