0
0
Fork 0
mirror of https://github.com/healthchecks/healthchecks.git synced 2025-04-08 06:30:05 +00:00

Add "Specifying Run IDs" section in docs

This commit is contained in:
Pēteris Caune 2022-11-10 18:34:35 +02:00
parent a26ca60046
commit 5a464f186f
No known key found for this signature in database
GPG key ID: E28D7679E9A9EDE2
3 changed files with 107 additions and 2 deletions

BIN
static/img/docs/run_ids.png Normal file

Binary file not shown.

After

(image error) Size: 79 KiB

View file

@ -41,4 +41,49 @@ more than 72 hours apart, they are assumed to be unrelated, and the duration is
not displayed.</p>
<p><img alt="List of checks with durations" src="IMG_URL/checks_durations.png" /></p>
<p>You can also see durations of the previous runs when viewing an individual check:</p>
<p><img alt="Log of received pings with durations" src="IMG_URL/details_durations.png" /></p>
<p><img alt="Log of received pings with durations" src="IMG_URL/details_durations.png" /></p>
<h2>Specifying Run IDs</h2>
<p>Wen several instances of the same job can run concurrenlty, the calculated run times
can come out wrong, as SITE_NAME cannot reliably determine which success event
corresponds to which start event. To work around this problem, the client can
optionally specify a run ID in the <code>rid</code> query parameter of any ping URL. When a
success event specifies the <code>rid</code> parameter, SITE_NAME will look for a
start event with a matching <code>rid</code> value when calculating the execution time.</p>
<p>The run IDs must be in a specific format: they must be UUID values in the canonical
textual representation (example: <code>728b3763-ea80-4113-9fc0-f49b3adf226a</code>, note no
curly braces, and no uppercase characters).</p>
<p>The client is free to pick run ID values randomly or use a deterministic process
to generate them. The only thing that matters is that the start and the success
pings of a single job execution use the same run ID value.</p>
<p>Below is an example shell script which generates the run ID using <code>uuidgen</code> and
makes HTTP requests using curl:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/sh</span>
<span class="nv">RID</span><span class="o">=</span><span class="sb">`</span>uuidgen<span class="sb">`</span>
<span class="c1"># send a start ping, specify rid parameter:</span>
curl -fsS -m <span class="m">10</span> --retry <span class="m">5</span> PING_URL/start?rid<span class="o">=</span><span class="nv">$RID</span>
<span class="c1"># ... FIXME: run the job here ...</span>
<span class="c1"># send the success ping, use same rid parameter:</span>
curl -fsS -m <span class="m">10</span> --retry <span class="m">5</span> PING_URL?rid<span class="o">=</span><span class="nv">$RID</span>
</code></pre></div>
<p>If client specifies run IDs, SITE_NAME will display them in the "Events"
section in a shortened form:</p>
<p><img alt="Log of received pings with run IDs and durations" src="IMG_URL/run_ids.png" /></p>
<p>Also note how the execution times are available for both "success" events. If the
run IDs were not used in this example, the event #4 would not show an execution time
since it is not preceded by a "start" event.</p>
<h2>Alerting Logic When Using Run IDs</h2>
<p>If a job sends a "start" signal, but then does not send a "success"
signal within its configured grace time, SITE_NAME will assume the job
has failed and notify you. However, when using Run IDs, there is an important
caveat: SITE_NAME <strong>will not monitor the execution times of all
concurrent job runs</strong>, it will only monitor the execution time of the
most recently started run.</p>
<p>To illustrate, let's assume the grace time of 1 minute, and look at the above example
again. The event #4 ran for 6 minutes 39 seconds and so overshot the time budget
of 1 minute. But SITE_NAME generated no alerts because <strong>the most recently started
run completed within the time limit</strong> (it took 37 seconds, which is less than 1 minute).</p>

View file

@ -53,4 +53,64 @@ not displayed.
You can also see durations of the previous runs when viewing an individual check:
![Log of received pings with durations](IMG_URL/details_durations.png)
![Log of received pings with durations](IMG_URL/details_durations.png)
## Specifying Run IDs
Wen several instances of the same job can run concurrenlty, the calculated run times
can come out wrong, as SITE_NAME cannot reliably determine which success event
corresponds to which start event. To work around this problem, the client can
optionally specify a run ID in the `rid` query parameter of any ping URL. When a
success event specifies the `rid` parameter, SITE_NAME will look for a
start event with a matching `rid` value when calculating the execution time.
The run IDs must be in a specific format: they must be UUID values in the canonical
textual representation (example: `728b3763-ea80-4113-9fc0-f49b3adf226a`, note no
curly braces, and no uppercase characters).
The client is free to pick run ID values randomly or use a deterministic process
to generate them. The only thing that matters is that the start and the success
pings of a single job execution use the same run ID value.
Below is an example shell script which generates the run ID using `uuidgen` and
makes HTTP requests using curl:
```bash
#!/bin/sh
RID=`uuidgen`
# send a start ping, specify rid parameter:
curl -fsS -m 10 --retry 5 PING_URL/start?rid=$RID
# ... FIXME: run the job here ...
# send the success ping, use same rid parameter:
curl -fsS -m 10 --retry 5 PING_URL?rid=$RID
```
If client specifies run IDs, SITE_NAME will display them in the "Events"
section in a shortened form:
![Log of received pings with run IDs and durations](IMG_URL/run_ids.png)
Also note how the execution times are available for both "success" events. If the
run IDs were not used in this example, the event #4 would not show an execution time
since it is not preceded by a "start" event.
## Alerting Logic When Using Run IDs
If a job sends a "start" signal, but then does not send a "success"
signal within its configured grace time, SITE_NAME will assume the job
has failed and notify you. However, when using Run IDs, there is an important
caveat: SITE_NAME **will not monitor the execution times of all
concurrent job runs**, it will only monitor the execution time of the
most recently started run.
To illustrate, let's assume the grace time of 1 minute, and look at the above example
again. The event #4 ran for 6 minutes 39 seconds and so overshot the time budget
of 1 minute. But SITE_NAME generated no alerts because **the most recently started
run completed within the time limit** (it took 37 seconds, which is less than 1 minute).