mirror of
https://github.com/healthchecks/healthchecks.git
synced 2025-02-18 18:42:28 +00:00
128 lines
No EOL
7.7 KiB
Text
128 lines
No EOL
7.7 KiB
Text
<h1>SITE_NAME Documentation</h1>
|
|
<p>SITE_NAME is a service for monitoring cron jobs (<a href="monitoring_cron_jobs/">see guide</a>)
|
|
and similar periodic processes:</p>
|
|
<ul>
|
|
<li>SITE_NAME <strong>listens for HTTP requests ("pings")</strong> from your cron jobs and scheduled
|
|
tasks.</li>
|
|
<li>It <strong>keeps silent</strong> as long as pings arrive on time.</li>
|
|
<li>It <strong>raises an alert</strong> as soon as a ping does not arrive on time.</li>
|
|
</ul>
|
|
<p>SITE_NAME works as a <a href="https://en.wikipedia.org/wiki/Dead_man%27s_switch">dead man's switch</a> for processes that need to
|
|
run continuously or on a regular, known schedule. Some examples of jobs that would
|
|
benefit from SITE_NAME monitoring:</p>
|
|
<ul>
|
|
<li>filesystem backups, database backups</li>
|
|
<li>task queues</li>
|
|
<li>database replication monitoring scripts</li>
|
|
<li>report generation scripts</li>
|
|
<li>periodic data import and sync jobs</li>
|
|
<li>periodic antivirus scans</li>
|
|
<li>DDNS updater scripts</li>
|
|
<li>SSL renewal scripts</li>
|
|
</ul>
|
|
<p>SITE_NAME is <em>not</em> the right tool for:</p>
|
|
<ul>
|
|
<li>monitoring website uptime by probing it with HTTP requests</li>
|
|
<li>collecting application performance metrics</li>
|
|
<li>log aggregation</li>
|
|
</ul>
|
|
<h2>Concepts</h2>
|
|
<p>A <strong>Check</strong> represents a single service you want to monitor. For example, when
|
|
<a href="monitoring_cron_jobs/">monitoring cron jobs</a>, you would create a separate check for
|
|
each cron job to be monitored. Each check has a unique ping URL, schedule,
|
|
and associated integrations. For the available configuration options, see
|
|
<a href="configuring_checks/">Configuring checks</a>.</p>
|
|
<p>Each check is always in one of the following states, depicted by a status icon:</p>
|
|
<dl>
|
|
<dt><span class="status ic-new"></span></dt>
|
|
<dd><strong>New</strong>. A newly created check that has not received any pings yet. Each new
|
|
check you create will start in this state.</dd>
|
|
<dt><span class="status ic-up"></span></dt>
|
|
<dd><strong>Up</strong>. All is well. The last "success" signal has arrived on time.</dd>
|
|
<dt><span class="status ic-grace"></span></dt>
|
|
<dd><strong>Late</strong>. The "success" signal is due but has not arrived yet.
|
|
It is not yet late by more than the check's configured <strong>Grace Time</strong>.</dd>
|
|
<dt><span class="status ic-down"></span></dt>
|
|
<dd><strong>Down</strong>. The "success" signal has not arrived yet, and the Grace Time has elapsed.
|
|
When a check transitions into the "Down" state, SITE_NAME sends alert
|
|
messages via the configured integrations.</dd>
|
|
<dt><span class="status ic-paused"></span></dt>
|
|
<dd><strong>Paused</strong>. You can manually pause the monitoring of specific checks. For example,
|
|
if a frequently running cron job has a known problem, and a fix is in the works
|
|
but not yet ready, you can pause monitoring the corresponding check temporarily to
|
|
avoid unwanted alerts about a known issue.</dd>
|
|
<dt><span class="status ic-up"></span><div class="spinner started"></div></dt>
|
|
<dd>Additionally, if the most recent received signal is a "start" signal,
|
|
this will be indicated by three animated dots under the check's status icon.</dd>
|
|
</dl>
|
|
<hr />
|
|
<p><strong>Ping URL</strong>. Each check has a unique <strong>Ping URL</strong>. Clients (cron jobs, background
|
|
workers, batch scripts, scheduled tasks, web services) make HTTP requests to the
|
|
ping URL to signal a start of the execution, a success, or a failure.</p>
|
|
<p>SITE_NAME supports two ping URL formats:</p>
|
|
<ul>
|
|
<li><code>PING_ENDPOINT<uuid></code><br>
|
|
The check is identified by its UUID. Check UUIDs are assigned
|
|
automatically by SITE_NAME, and are guaranteed to be unique.</li>
|
|
<li><code>PING_ENDPOINT<project-ping-key>/<name-slug></code><br>
|
|
The check is identified by project's <strong>Ping key</strong> and check's
|
|
<strong>slug</strong> (user-chosen, URL-friendly identifier). Optionally supports auto-provisioning:
|
|
if you ping a slug value that does not have a corresponding check, SITE_NAME can
|
|
create the check automatically.</li>
|
|
</ul>
|
|
<p>You can append <code>/start</code>, <code>/fail</code> or <code>/<exitcode></code> to the base ping URL to send
|
|
"start" and "failure" signals. The "start" and "failure" signals are optional.
|
|
You don't have to use them, but you can gain additional monitoring insights
|
|
if you do use them. See <a href="measuring_script_run_time/">Measuring script run time</a> and
|
|
<a href="signaling_failures/">Signaling failures</a> for details.</p>
|
|
<p>You should treat check UUIDs and project Ping keys as secrets. If you make them public,
|
|
anybody can send telemetry signals to your checks and mess with your monitoring.</p>
|
|
<p>Read more about Ping URLs in <a href="http_api/">Pinging API</a>.</p>
|
|
<hr />
|
|
<p><strong>Grace Time</strong> is one of the configuration parameters you can set for each check.
|
|
It is the additional time to wait before sending an alert when a check
|
|
is late. Use this parameter to account for minor, expected deviations in job
|
|
execution times.</p>
|
|
<p>When a check is considered <em>late</em> depends on whether the check uses a simple
|
|
or cron schedule, and whether or not you are
|
|
<a href="measuring_script_run_time/">tracking job durations</a> using the "start" events.</p>
|
|
<p>For <strong>simple schedules</strong>, the check is late when the checks's configured period has passed.
|
|
For example, consider a periodic task that should run every hour, and the gaps between
|
|
runs should not deviate by more than 5 minutes (Period = 1 hour,
|
|
Grace Time = 5 minutes). And let's say the last successful ping arrived at 12:00.</p>
|
|
<ul>
|
|
<li>At 13:00 the check will be declared late (because 1 hour will have passed
|
|
since the last ping).</li>
|
|
<li>At 13:05 the check will be declared down and the alerts will go out (because
|
|
1 hour + 5 minutes will have passed since the last ping).</li>
|
|
</ul>
|
|
<p>For <strong>cron and OnCalendar schedules</strong>, the check enters the late state at the exact
|
|
moment when the current wall clock time matches the schedule. Let's consider a cron
|
|
job with the schedule <code>10 * * * *</code> (10 minutes past every hour) and grace time of 5 minutes.
|
|
And let's say the last successful ping arrived at 12:30.</p>
|
|
<ul>
|
|
<li>At 13:10 the check will be declared late (because 13:10 is the next scheduled time
|
|
the cron job is expected to send a ping according to the cron schedule).</li>
|
|
<li>At 13:15 the check will be declared down and the alerts will go out (because 5
|
|
minutes will have passed since the time the cron job was expected to check in).</li>
|
|
</ul>
|
|
<p>If you use "start" signals to <a href="measuring_script_run_time/">measure job execution time</a>,
|
|
Grace Time also sets the maximum allowed time gap between "start" and "success" signals.
|
|
If a job sends a "start" signal but does not send a "success" signal within grace time,
|
|
SITE_NAME will assume failure and send out alerts.</p>
|
|
<hr />
|
|
<p>An <strong>Integration</strong> is a specific method for delivering monitoring alerts when a check's
|
|
change states. SITE_NAME supports many types of integrations: email,
|
|
webhooks, SMS, Slack, PagerDuty, etc. You can set up multiple integrations.
|
|
For each check, you can specify which integrations it should use.</p>
|
|
<p>For more information on integrations, see
|
|
<a href="configuring_notifications/">Configuring notifications</a>.</p>
|
|
<hr />
|
|
<p><strong>Project</strong>. To keep things organized, you can group checks and integrations in <strong>Projects</strong>.
|
|
Your account starts with a single default project, but you can create
|
|
additional projects as needed. You can transfer existing checks between projects
|
|
while preserving their configuration and ping URLs.</p>
|
|
<p>Each project has a configurable name, a separate set of API keys, and a separate
|
|
project team. The project's team is the set of people you have granted read-only or
|
|
read-write access to the project.</p>
|
|
<p>For more information on projects, see <a href="projects_teams/">Projects and teams</a>.</p> |