0
0
Fork 0
mirror of https://github.com/healthchecks/healthchecks.git synced 2025-04-07 22:25:35 +00:00
Commit graph

1101 commits

Author SHA1 Message Date
Pēteris Caune
b685e66b71
Add a workaround for reverse() omitting script prefix when on thread
https://code.djangoproject.com/ticket/35985

cc: 
2024-12-09 11:53:53 +02:00
Pēteris Caune
22268c1484
Move absolute URL construction to hc.lib.urls.absolute_reverse()
absolute_reverse() works the same as django.urls.reverse()
except it generates absolute URLs (starting with http[s]://)
2024-12-03 17:24:27 +02:00
Pēteris Caune
5e848f4976
Add index on api_flip (owner_id, created)
This helps queries in hc.front.views._get_events,
especially for checks that are flipping between up and down
states a lot.
2024-12-03 10:37:01 +02:00
Pēteris Caune
b328c8739f
Reduce the number of Check.get_status() calls 2024-11-14 13:33:21 +02:00
Pēteris Caune
5c67c94654
Add a missing article 2024-11-08 11:31:09 +02:00
Pēteris Caune
5912758a8a
Update email alerts to mention failure reason
cc: 
2024-11-08 11:20:44 +02:00
Pēteris Caune
9edae634c7
Add Flip.reason field
cc: 
2024-11-08 10:24:50 +02:00
Pēteris Caune
79da9e9f4f
Fix auto-fixable ruff warnings
(`ruff check --fix`)
2024-11-07 15:15:58 +02:00
Pēteris Caune
4907073c55
Remove unneeded quotes 2024-11-06 19:32:44 +02:00
Pēteris Caune
e048ec4c48
Fix "class Foo(object):" -> "class Foo:"
In Python 3 these are equivalent, and shorter is better.
2024-10-29 17:57:50 +02:00
Pēteris Caune
a6ca589b34
Fix pyright warning 2024-10-29 11:54:48 +02:00
Pēteris Caune
c372e3232f
Update MS Teams legacy webhook retirement date to Jan 2025
Microsoft pushed it forward again:
https://devblogs.microsoft.com/microsoft365dev/retirement-of-office-365-connectors-within-microsoft-teams/
2024-10-25 09:51:58 +03:00
Pēteris Caune
9e69b5b5f5
Fix smtp listener to reject email addresses with unexpected domain
cc: 
2024-10-21 17:48:57 +03:00
Pēteris Caune
84f22c8978
Fix type warnings 2024-10-21 17:34:02 +03:00
Pēteris Caune
a5d4dc2db5
Fix smtp listener to reject email addresses with non-UUID local parts
cc: )
2024-10-21 15:56:24 +03:00
Pēteris Caune
c91213179f
Fix API to gracefully handle too long slugs 2024-10-16 12:35:30 +03:00
Pēteris Caune
8c210e151f
Update the Signal integration to retry on network errors 2024-10-14 11:19:37 +03:00
Pēteris Caune
4f9b0b11b9
Update Signal transport to log unexpected signal-cli replies
When signal-cli returns an error that we are not handling yet,
log the precise JSON message that signal-cli returns. This
is for debug & development: We can look at the logged messages
and see what additional special error handling may be needed.
2024-10-10 10:21:08 +03:00
Pēteris Caune
fd96cc794b
Remove unused bits 2024-10-04 17:34:30 +03:00
Pēteris Caune
a51420744c
Add RiskCheck: disable in SMS transport
This is to reduce the chance of hitting Twilio error 30453,
"Message couldn't be delivered".

https://www.twilio.com/docs/api/errors/30453
2024-10-02 17:01:23 +03:00
Pēteris Caune
de4c4897e3
Remove prunenotifications management command
Notifications are now cleaned up automatically during pinging.
2024-10-02 09:24:01 +03:00
Pēteris Caune
13f92b90ef
Update settings.py to read SECURE_PROXY_SSL_HEADER from env vars
And add it to docs.

And add a system check to make sure it, if set, is a tuple
with 2 elements.

cc: 
2024-10-01 19:13:26 +03:00
Pēteris Caune
e73d7a1ece
Remove pruneflips management command
Flips are now cleaned up automatically during pinging.
2024-10-01 15:33:56 +03:00
Pēteris Caune
2cb47d3742
Make the sorting of null values in Flip.select_channels() explicit 2024-09-12 10:52:06 +03:00
Pēteris Caune
f241d070e1
Update Flip.select_channels() to sort channels by last_notify_duration
If a check has multiple associated channels, some are slow and
some are quick, handle the quick ones first.
2024-09-12 10:44:56 +03:00
Pēteris Caune
f60af9a156
Update ntfy integration to give up db connection before network IO 2024-09-12 10:30:58 +03:00
Pēteris Caune
28af3720f4
Increase outgoing webhook timeout from 10 to 30 seconds
Also simplify the retry logic: each retry attempt is now
allowed to use the full 30 seconds. This means, a single
webhook delivery can take up to 3*30=90 seconds.
2024-09-11 12:37:40 +03:00
Pēteris Caune
13217af304
Add --pool parameter in manage.py sendalerts
If sendalerts receives this parameter, it reconfigures
settings.DATABASES to enable db connection pooling
(using psycopg_pool with default parameters).

This lets us use many concurrent worker threads but not
run out of database connections. For example, with
`--num-workers 100 --pool`, up to 100 worker threads can run
concurrently, but only 3 threads can get a database connection
from the pool, the rest have to wait. When a worker thread
gives up a connection (by calling `close_old_connections`),
another thread can continue.

A worker thread can give up a db connection before it is fully
finished if it anticipates a long network IO operation ahead.
The Webhook transport does this before making a curl call.

psycopg_pool's default pool size is 4 connections. One
connection is used up by the main thread, so 3 connections
are available for the worker threads.
2024-09-10 14:58:24 +03:00
Pēteris Caune
8eecece0bb
Add db migration for the updated msteams name 2024-09-10 14:45:48 +03:00
Pēteris Caune
6bf588d984
Remove unused import 2024-09-04 10:49:09 +03:00
Pēteris Caune
9d4fc031aa
Fix sendalerts to check the self.shutdown flag more often 2024-09-03 10:30:18 +03:00
Pēteris Caune
3275e0ffaa
Update notify() to return logs instead of printing them 2024-09-03 10:23:15 +03:00
Pēteris Caune
8c56ca6dde
Update sendalerts to mark flip as processed on thread
Previously this was done in process_one_flip (so on the main thread).
The advantage of doing this way is the flip gets marked as processed
only when the thread has started and has acquired a db connection.
There is now a smaller pause between a sendalerts process claiming a
flip, and actually starting work on it.
2024-09-01 15:28:48 +03:00
Pēteris Caune
fd75049e0c
Fix type warnings 2024-08-31 19:23:10 +03:00
Pēteris Caune
a463daa775
Update Webhook transport to close db connection before network IO
Webhook requests can take 20+ seconds. During that time we hold
on to a database connection. With this commit, the Webhook transport
closes its DB connection before making a curl call.

With psycopg2 this does not have much effect. But with
psycopg 3 & connection pooling we will be able to use more
sendalerts workers than we have database connections. While one
worker is busy making a slow curl call, another worker can
grab its freed up connection and do some work.

Django's test runner is not happy with connections closed
mid-test, so I patched out close_old_connections() in affected tests.
2024-08-31 19:18:17 +03:00
Pēteris Caune
9803d77a1d
Set explicit max_workers value for ThreadPoolExecutor
This is a tricky one: the default value for max_workers is
None. But it doesn't mean "unlimited", in Python 3.8+ it
means "min(32, os.cpu_count() + 4)"

For example on 8-core CPU the effective value would be 8 + 4 = 12,
and passing anything above 12 to `--max-workers` would have no effect.
2024-08-31 19:11:39 +03:00
Pēteris Caune
4cd677536d
Remove sent notification counter
The counter was slightly wrong (it counted lost races as sent
notifications). Rather than complicating code to make it correct,
let's rather just remove it :-)
2024-08-31 19:07:25 +03:00
Pēteris Caune
faa1a2c99f
Add logging for exceptions thrown inside notify() 2024-08-31 19:04:41 +03:00
Pēteris Caune
7641f2a9a1
Switch to using close_old_connections() instead of connection.close() 2024-08-31 19:02:11 +03:00
Pēteris Caune
d76dc53e49
Increase Signal send timeout to 60 seconds 2024-08-31 11:07:17 +03:00
Pēteris Caune
b1b0a57033
Tweak sendalerts log format 2024-08-30 17:00:30 +03:00
Pēteris Caune
8a3a9b2a7e
Fix code comments 2024-08-29 16:30:28 +03:00
Pēteris Caune
029881f3b9
Refactor sendalerts
* Remove the --no-loop and --no-threads arguments
* Use a threadpool to do multiple sends concurrently
* Add a new `--num-workers` argument. It limits how many flips we grab
  from the database and process concurrently.
* Do not prioritize flips with historically low send times any more
  (not as important now with concurrent sending, and simpler this way)
* Workers close db connections when they finish
  (to keep the number of idle connections low)

Note: concurrent.futures.ThreadPoolExecutor internally has an unbounded
queue, it will accept any amount of jobs and keep them queued. We don't
want that. We only want to grab a flip, and commit to processing it,
if we know there's a free worker for it. Therefore we're tracking the
number of jobs in flight using a semaphore (`self.seats`).
2024-08-29 16:20:36 +03:00
Pēteris Caune
3968a4f9e0
Update MS Teams Connector EOL date 2024-08-27 16:34:59 +03:00
Pēteris Caune
70b55a777b
Add migration which updates Channel.kind values
This is to go with 8054191be3,
and should have been in there :-)

cc: 
2024-08-17 12:12:47 +03:00
Pēteris Caune
d3ae4e7fac
Add support for $SLUG placeholder in webhook payloads
Fixes: 
2024-08-16 13:24:12 +03:00
Pēteris Caune
56862a1c49
Update NotificationsAdmin to use __ lookup in list_display 2024-08-07 17:39:17 +03:00
Pēteris Caune
42b733540d
Fix type annotation
It used the wrong model name and neither me nor mypy noticed
until upgrade to django-stubs 5.0.4
2024-07-29 09:50:56 +03:00
Pēteris Caune
7346994ae8
Fix field name in TypedDict used for type checking 2024-07-18 18:19:01 +03:00
Pēteris Caune
bdb6f18a3d
Add "uuid" field in API responses when read/write key is used
The API responses already contain ping_url, update_url, resume_url,
pause_url fields where the UUID can be extracted from, so we are
not exposing new information. The extraction can be finicky in,
say, shell-scripting scenarios. So for API user convenience we will
now also provide the check's code (UUID) as a separate field.

Fixes: 
2024-07-18 18:15:52 +03:00