healthchecks_healthchecks/hc/lib/tests
Pēteris Caune b5d4f2aa74
Implement S3 outage mitigation
The mitigation is to not attempt GetObject calls if there have
been more than 3 S3 errors in the past minute. The implementation
uses the TokenBucket class that we normally use for rate-limiting.

An example scenario this is trying to avoid is:

* the S3 service becomes unavailable for 10 straight minutes.
  Each S3 request hangs until we hit the configured timeout
  (settings.S3_TIMEOUT)
* A client is frequently requesting the "Get ping's logged body"
  API call. Each call causes one webserver process to become
  busy for S3_TIMEOUT seconds.
* All workers become busy, request backlog fills up, our service
  starts returning 5xx errors.

With the mitigation, during an S3 outage, only the calls that
retrieve ping's logged body will return 503, the rest of the service
will (hopefully) work normally.

Fixes: #1114
2025-01-13 14:21:42 +02:00
..
__init__.py Tests for badges 2016-07-03 19:24:44 +03:00
test_badges.py Update settings.py to allow subpath in SITE_ROOT 2024-12-04 10:37:52 +02:00
test_curl.py Fix "class Foo(object):" -> "class Foo:" 2024-10-29 17:57:50 +02:00
test_date.py Increase the precision in hc.lib.date.format_approx_duration 2023-10-02 12:50:59 +03:00
test_emails.py Improve type hints 2023-09-05 13:31:59 +03:00
test_html.py Improve type hints 2023-09-05 13:31:59 +03:00
test_s3.py Implement S3 outage mitigation 2025-01-13 14:21:42 +02:00
test_signing.py Improve type hints 2023-09-05 13:31:59 +03:00
test_statsd.py Fix unclosed sockets in statsd tests 2024-06-27 11:03:29 +03:00
test_string.py Improve type hints 2023-09-05 13:31:59 +03:00