When signal-cli returns an error that we are not handling yet,
log the precise JSON message that signal-cli returns. This
is for debug & development: We can look at the logged messages
and see what additional special error handling may be needed.
Also simplify the retry logic: each retry attempt is now
allowed to use the full 30 seconds. This means, a single
webhook delivery can take up to 3*30=90 seconds.
If sendalerts receives this parameter, it reconfigures
settings.DATABASES to enable db connection pooling
(using psycopg_pool with default parameters).
This lets us use many concurrent worker threads but not
run out of database connections. For example, with
`--num-workers 100 --pool`, up to 100 worker threads can run
concurrently, but only 3 threads can get a database connection
from the pool, the rest have to wait. When a worker thread
gives up a connection (by calling `close_old_connections`),
another thread can continue.
A worker thread can give up a db connection before it is fully
finished if it anticipates a long network IO operation ahead.
The Webhook transport does this before making a curl call.
psycopg_pool's default pool size is 4 connections. One
connection is used up by the main thread, so 3 connections
are available for the worker threads.
Previously this was done in process_one_flip (so on the main thread).
The advantage of doing this way is the flip gets marked as processed
only when the thread has started and has acquired a db connection.
There is now a smaller pause between a sendalerts process claiming a
flip, and actually starting work on it.
Webhook requests can take 20+ seconds. During that time we hold
on to a database connection. With this commit, the Webhook transport
closes its DB connection before making a curl call.
With psycopg2 this does not have much effect. But with
psycopg 3 & connection pooling we will be able to use more
sendalerts workers than we have database connections. While one
worker is busy making a slow curl call, another worker can
grab its freed up connection and do some work.
Django's test runner is not happy with connections closed
mid-test, so I patched out close_old_connections() in affected tests.
This is a tricky one: the default value for max_workers is
None. But it doesn't mean "unlimited", in Python 3.8+ it
means "min(32, os.cpu_count() + 4)"
For example on 8-core CPU the effective value would be 8 + 4 = 12,
and passing anything above 12 to `--max-workers` would have no effect.
The counter was slightly wrong (it counted lost races as sent
notifications). Rather than complicating code to make it correct,
let's rather just remove it :-)
* Remove the --no-loop and --no-threads arguments
* Use a threadpool to do multiple sends concurrently
* Add a new `--num-workers` argument. It limits how many flips we grab
from the database and process concurrently.
* Do not prioritize flips with historically low send times any more
(not as important now with concurrent sending, and simpler this way)
* Workers close db connections when they finish
(to keep the number of idle connections low)
Note: concurrent.futures.ThreadPoolExecutor internally has an unbounded
queue, it will accept any amount of jobs and keep them queued. We don't
want that. We only want to grab a flip, and commit to processing it,
if we know there's a free worker for it. Therefore we're tracking the
number of jobs in flight using a semaphore (`self.seats`).
The API responses already contain ping_url, update_url, resume_url,
pause_url fields where the UUID can be extracted from, so we are
not exposing new information. The extraction can be finicky in,
say, shell-scripting scenarios. So for API user convenience we will
now also provide the check's code (UUID) as a separate field.
Fixes: #1007