witten_borgmatic/docs/how-to/deal-with-very-large-backups.md
2024-11-24 19:40:00 -08:00

15 KiB

title eleventyNavigation
How to deal with very large backups
key parent order
📏 Deal with very large backups How-to guides 4

Biggish data

Borg itself is great for efficiently de-duplicating data across successive backup archives, even when dealing with very large repositories. But you may find that while borgmatic's default actions of create, prune, compact, and check works well on small repositories, it's not so great on larger ones. That's because running the default pruning, compact, and consistency checks take a long time on large repositories.

Prior to version 1.7.9 The default action ordering was prune, compact, create, and check.

A la carte actions

If you find yourself wanting to customize the actions, you have some options. First, you can run borgmatic's create, prune, compact, or check actions separately. For instance, the following optional actions are available (among others):

borgmatic create
borgmatic prune
borgmatic compact
borgmatic check

You can run borgmatic with only one of these actions provided, or you can mix and match any number of them in a single borgmatic run. This supports approaches like skipping certain actions while running others. For instance, this skips prune and compact and only runs create and check:

borgmatic create check

New in version 1.7.9 borgmatic now respects your specified command-line action order, running actions in the order you specify. In previous versions, borgmatic ran your specified actions in a fixed ordering regardless of the order they appeared on the command-line.

But instead of running actions together, another option is to run backups with create on a frequent schedule (e.g. with borgmatic create called from one cron job), while only running expensive consistency checks with check on a much less frequent basis (e.g. with borgmatic check called from a separate cron job).

New in version 1.8.5 Instead of (or in addition to) specifying actions on the command-line, you can configure borgmatic to skip particular actions.

Consistency check configuration

Another option is to customize your consistency checks. By default, if you omit consistency checks from configuration, borgmatic runs full-repository checks (repository) and per-archive checks (archives) within each repository, running the checks on a monthly basis. (See below about setting your own check frequency.)

But if you find that archive checks are too slow, for example, you can configure borgmatic to run repository checks only. Configure this in the consistency section of borgmatic configuration:

checks:
    - name: repository

Prior to version 1.8.0 Put this option in the consistency: section of your configuration.

Prior to version 1.6.2 The checks option was a plain list of strings without the name: part, and borgmatic ran each configured check every time checks were run. For example:

checks:
    - repository

Here are the available checks, roughly from fastest to slowest:

  • archives: Checks all of the archives' metadata in the repository.
  • repository: Checks the consistency of the whole repository. The checks run on the server and do not cause significant network traffic.
  • extract: Performs an extraction dry-run of the latest archive.
  • data: Verifies the data integrity of all archives contents, decrypting and decompressing all data.
  • spot: Compares file counts and contents between your source files and the latest archive.

Note that the data check is a more thorough version of the archives check, so enabling the data check implicitly enables the archives check as well.

See Borg's check documentation for more information.

Spot check

The various consistency checks all have trade-offs around speed and thoroughness, but most of them don't even look at your original source files—arguably one important way to ensure your backups contain the files you'll want to restore in the case of catastrophe (or an accidentally deleted file). Because if something goes wrong with your source files, most consistency checks will still pass with flying colors and you won't discover there's a problem until you go to restore.

New in version 1.8.10 That's where the spot check comes in. This check actually compares your source file counts and data against those in the latest archive, potentially catching problems like incorrect excludes, inadvertent deletes, files changed by malware, etc.

But because an exhaustive comparison of all source files against the latest archive might be too slow, the spot check supports sampling a percentage of your source files for the comparison, ensuring they fall within configured tolerances.

Here's how it works. Start by installing the xxhash OS package if you don't already have it, so the spot check can run the xxh64sum command and efficiently hash files for comparison. Then add something like the following to your borgmatic configuration:

checks:
    - name: spot
      count_tolerance_percentage: 10
      data_sample_percentage: 1
      data_tolerance_percentage: 0.5

The count_tolerance_percentage is the percentage delta between the source directories file count and the latest backup archive file count that is allowed before the entire consistency check fails. For instance, if the spot check runs and finds 100 source files on disk and 105 files in the latest archive, that would be within the configured 10% count tolerance and the check would succeed. But if there were 100 source files and 200 archive files, the check would fail. (100 source files and only 50 archive files would also fail.)

The data_sample_percentage is the percentage of total files in the source directories to randomly sample and compare to their corresponding files in the latest backup archive. A higher value allows a more accurate check—and a slower one. The comparison is performed by hashing the selected source files and counting hashes that don't match the latest archive. For instance, if you have 1,000 source files and your sample percentage is 1%, then only 10 source files will be compared against the latest archive. These sampled files are selected randomly each time, so in effect the spot check is probabilistic.

The data_tolerance_percentage is the percentage of total files in the source directories that can fail a spot check data comparison without failing the entire consistency check. The value must be lower than or equal to the data_sample_percentage, because data_tolerance_percentage only looks at at the sampled files as determined by data_sample_percentage.

All three options are required when using the spot check. And because the check relies on these configured tolerances, it may not be a set-it-and-forget-it type of consistency check, at least until you get the tolerances dialed in so there are minimal false positives or negatives. It is recommended you run borgmatic check several times after configuring the spot check, tweaking your tolerances as needed. For certain workloads where your source files experience wild swings of file contents or counts, the spot check may not suitable at all.

What if you add, delete, or change a bunch of your source files and you don't want the spot check to fail the next time it's run? Run borgmatic create to create a new backup, thereby allowing the next spot check to run against an archive that contains your recent changes.

Because the spot check only looks at the most recent archive, you may not want to run it immediately after a create action (borgmatic's default behavior). Instead, it may make more sense to run the spot check on a separate schedule from create.

Check frequency

New in version 1.6.2 You can optionally configure checks to run on a periodic basis rather than every time borgmatic runs checks. For instance:

checks:
    - name: repository
      frequency: 2 weeks
    - name: archives
      frequency: 1 month

Prior to version 1.8.0 Put this option in the consistency: section of your configuration.

This tells borgmatic to run the repository consistency check at most once every two weeks for a given repository and the archives check at most once a month. The frequency value is a number followed by a unit of time, e.g. 3 days, 1 week, 2 months, etc. The set of possible time units is as follows (singular or plural):

  • second
  • minute
  • hour
  • day
  • week (7 days)
  • month (30 days)
  • year (365 days)

The frequency defaults to always for a check configured without a frequency, which means run this check every time checks run. But if you omit consistency checks from configuration entirely, borgmatic runs full-repository checks (repository) and per-archive checks (archives) within each repository, at most once a month.

Unlike a real scheduler like cron, borgmatic only makes a best effort to run checks on the configured frequency. It compares that frequency with how long it's been since the last check for a given repository If it hasn't been long enough, the check is skipped. And you still have to run borgmatic check (or borgmatic without actions) in order for checks to run, even when a frequency is configured!

This also applies across configuration files that have the same repository configured. Make sure you have the same check frequency configured in each though—or the most frequently configured check will apply.

New in version 1.9.0To support this frequency logic, borgmatic records check timestamps within the ~/.local/state/borgmatic/checks directory. To override the ~/.local/state portion of this path, set the user_state_directory configuration option. Alternatively, set the XDG_STATE_HOME environment variable.

New in version 1.9.2The STATE_DIRECTORY environment variable also works for this purpose. It's set by systemd if StateDirectory=borgmatic is added to borgmatic's systemd service file.

Prior to version 1.9.0 borgmatic recorded check timestamps within the ~/.borgmatic directory. At that time, the path was configurable by the borgmatic_source_directory configuration option (now deprecated).

If you want to temporarily ignore your configured frequencies, you can invoke borgmatic check --force to run checks unconditionally.

New in version 1.8.6 borgmatic check --force runs check even if it's specified in the skip_actions option.

Check days

New in version 1.8.13 You can optionally configure checks to only run on particular days of the week. For instance:

checks:
    - name: repository
      only_run_on:
         - Saturday
         - Sunday
    - name: archives
      only_run_on:
         - weekday
    - name: spot
      only_run_on:
         - Friday
         - weekend

Each day of the week is specified in the current locale (system language/country settings). weekend and weekday are also accepted.

As with frequency, borgmatic only makes a best effort to run checks on the given day of the week. For instance, if you run borgmatic check daily, then every day borgmatic will have an opportunity to determine whether your checks are configured to run on that day. If they are, then the checks run. If not, they are skipped.

For instance, with the above configuration, if borgmatic is run on a Saturday, the repository check will run. But on a Monday? The repository check will get skipped. And if borgmatic is never run on a Saturday or a Sunday, that check will never get a chance to run.

Also, the day of the week configuration applies after any configured frequency for a check. So for instance, imagine the following configuration:

checks:
    - name: repository
      frequency: 2 weeks
      only_run_on:
         - Monday

If you run borgmatic daily with that configuration, then borgmatic will first wait two weeks after the previous check before running the check again—on the first Monday after the frequency duration elapses.

Running only checks

New in version 1.7.1 If you would like to only run consistency checks without creating backups (for instance with the check action on the command-line), you can omit the source_directories option entirely.

Prior to version 1.7.1 In older versions of borgmatic, instead specify an empty source_directories value, as it is a mandatory option there:

location:
    source_directories: []

Disabling checks

If that's still too slow, you can disable consistency checks entirely, either for a single repository or for all repositories.

New in version 1.8.5 Disabling all consistency checks looks like this:

skip_actions:
    - check

Prior to version 1.8.5 Use this configuration instead:

checks:
    - name: disabled

Prior to version 1.8.0 Put checks: in the consistency: section of your configuration.

Prior to version 1.6.2 checks: was a plain list of strings without the name: part. For instance:

checks:
    - disabled

If you have multiple repositories in your borgmatic configuration file, you can keep running consistency checks, but only against a subset of the repositories:

check_repositories:
    - path/of/repository_to_check.borg

Finally, you can override your configuration file's consistency checks and run particular checks via the command-line. For instance:

borgmatic check --only data --only extract

This is useful for running slow consistency checks on an infrequent basis, separate from your regular checks. It is still subject to any configured check frequencies unless the --force flag is used.

Troubleshooting

Broken pipe with remote repository

When running borgmatic on a large remote repository, you may receive errors like the following, particularly while "borg check" is validating backups for consistency:

    Write failed: Broken pipe
    borg: Error: Connection closed by remote host

This error can be caused by an ssh timeout, which you can rectify by adding the following to the ~/.ssh/config file on the client:

    Host *
        ServerAliveInterval 120

This should make the client keep the connection alive while validating backups.