The on_notify_done callback was accessing exception data incorrectly.
If there's been an exception in the thread, it will be re-thrown while
calling future.result(), and we must catch it with "try ... except"
instead of calling future.exception().
Instead of using SCRIPT_NAME / FORCE_SCRIPT_NAME, PATH_INFO
and their associated issues, update urls.py to add the subpath
to all routes. This allows us to get rid of several hacks:
* the uwsgi.ini magic which parses SITE_ROOT, sets SCRIPT_NAME
and fixes PATH_INFO
* set_script_prefix() in sendalerts
* chopping the subpath off an URL in hc.accounts.views._allow_redirect
The idea comes from @apollo13
in https://code.djangoproject.com/ticket/35985#comment:5
cc: #1091
If sendalerts receives this parameter, it reconfigures
settings.DATABASES to enable db connection pooling
(using psycopg_pool with default parameters).
This lets us use many concurrent worker threads but not
run out of database connections. For example, with
`--num-workers 100 --pool`, up to 100 worker threads can run
concurrently, but only 3 threads can get a database connection
from the pool, the rest have to wait. When a worker thread
gives up a connection (by calling `close_old_connections`),
another thread can continue.
A worker thread can give up a db connection before it is fully
finished if it anticipates a long network IO operation ahead.
The Webhook transport does this before making a curl call.
psycopg_pool's default pool size is 4 connections. One
connection is used up by the main thread, so 3 connections
are available for the worker threads.
Previously this was done in process_one_flip (so on the main thread).
The advantage of doing this way is the flip gets marked as processed
only when the thread has started and has acquired a db connection.
There is now a smaller pause between a sendalerts process claiming a
flip, and actually starting work on it.
This is a tricky one: the default value for max_workers is
None. But it doesn't mean "unlimited", in Python 3.8+ it
means "min(32, os.cpu_count() + 4)"
For example on 8-core CPU the effective value would be 8 + 4 = 12,
and passing anything above 12 to `--max-workers` would have no effect.
The counter was slightly wrong (it counted lost races as sent
notifications). Rather than complicating code to make it correct,
let's rather just remove it :-)
* Remove the --no-loop and --no-threads arguments
* Use a threadpool to do multiple sends concurrently
* Add a new `--num-workers` argument. It limits how many flips we grab
from the database and process concurrently.
* Do not prioritize flips with historically low send times any more
(not as important now with concurrent sending, and simpler this way)
* Workers close db connections when they finish
(to keep the number of idle connections low)
Note: concurrent.futures.ThreadPoolExecutor internally has an unbounded
queue, it will accept any amount of jobs and keep them queued. We don't
want that. We only want to grab a flip, and commit to processing it,
if we know there's a free worker for it. Therefore we're tracking the
number of jobs in flight using a semaphore (`self.seats`).
... and pass it to Transport.notify_flip().
This allows us to pass flip-specific information (the flip timestamp,
the new status) to transport classes.
This results in changes in other places too:
* curl.post() does not accept `data` as positional arg,
it must now be a keyword argument
* we need asserts and if clauses in a few places to make sure
we are not passing `None` in the arguments to hc.lib.curl.request
sendalerts had support for sending notifications
synchronously (with the --no-threads flag) and asynchronously using
threads (the default).
It turns out there was a bug in argument handling and sendalerts
was always using the synchronous mode regardless of the
presence/absence of the "--no-threads" flag. Since noone seems to
have noticed, I removed the unused async code.
The smtpd management command runs a SMTP listener for receiving
ping signals as emails. If emails arrive infrequently, the
database connections can time out. A brute-force workaround for
this was to call `connections.close_all()` before making new
DB queries.
Code moved around while migrating to aiosmtpd, and looks like
the workaround did not work any more – #847. This commit
replaces `connections.close_all()` with `connection.close()` and
moves it to the `_process_message()` function, which *hopefully* fixes
the problem.