We used "body" to store request body as text.
In 2022 we added "body_raw" and started to use it to store request
body as bytes.
In python code we currently need to inspect both fields,
because the data could be in "body" (for old pings) or in
"body_raw" (for newer pings). My plan is to eventually get rid
of the "body" field, and have "body_raw" only. This data migration
is a step towards that: for any Ping objects that have non-empty
"body" field, it moves the data to the "body_raw" field. After
applying this migration, the "body" field should be empty (empty
string or null) for all Ping objects.
There are three related changes:
* Removed legacy timezones from hc.lib.tz.all_timezones
* Added data migration to update existing Check.tz values
* For backwards compatibility, added code to automatically
replace a legacy timezone with a canonical timezone when a
legacy timezone is passed to an API call
I used the timezone mapping on
https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
This is primarily to make notification lookups by code efficient.
We look up notifications by code in hc.api.views.boundces.
This field has a default value (uuid.uuid4), so any null values
will be filled with random UUIDs during migration.
* Add Check.last_start_rid field
* Fill Check.last_start_rid on every start event
* Clear Check.last_start on every "fail" event
* Clear Check.last_start on success event if either case is true:
- the event's rid matches Check.last_start_rid
- the event does not specify rid
In human terms, the alerting logic will be: we track the
execution time of the most recent "start" event only. It would
take a major redesign to track the execution time of all
concurrent "start" events and send alerts when *any* of them
overshoots the time budget. So, whenever we see a "start" event,
the timer resets.
Example:
* 00:00 client sends start signal with rid=A, timer starts
* 00:10 client sends start signal with rid=B, timer resets
* 00:20 client sends success signal with rid=A, timer
does not reset because rid A does not match the rid seen in
the most recent start signal (it was B)
* 00:30 the grace time runs out, the check's status shows
as started + failed
At this point the check can be reset to a healthy state in 3
different ways:
* send a success signal with rid=B
* send a failure signal with any rid value or without it
* send a success signal without a rid value
* Added duration to ping details. This is useful on a device with a small screen, since the duration cannot be seen in the main view so now one can see it in the ping's details.
* Changed terms across the board from "delta" to "duration"
* timedelta is now consistently imported as "td" across the entire project (even in Django generated migration files)
When adding "NOT NULL" on multiple columns at once, Django
throws errors:
django.db.utils.OperationalError:
cannot ALTER TABLE "api_check" because it has
pending trigger events
A workaround is to modify columns one by one in
separate migrations.
- Refactor transport classes to raise exceptions
on delivery problems, instead of returning error
message as string. Exceptions can carry extra meta
information (see TransportError.permanent field, see
MigrationRequiredError subclass). I considered attaching
the extra information to strings by subclassing str, but
using exceptions felt cleaner and less hacky.
- Add Channel.disabled field, for disabling integrations
on permanent errors. For example, if Slack returns
HTTP 404, we will now mark the integration as disabled
and will not make requests to that Slack endpoint again.