How to automate backups of a PostgreSQL server using Barman/barman-cloud-backup to S3

2023-11-08

I was surprised not to find many up to date instructions on this. I have a few basic requirements:

Back up daily to an S3 bucket
Keep a certain number of backups
Run automatically, preferably using systemd not cron as it’s easier to set up and troubleshoot
Use a user with least privileges on the database, operating system, and in AWS/S3
Send the results of each backup activity to healthchecks.io

After a bit of playing around, I decided to use Barman for the backups – it’s significantly easier to configure and use than pgBackRest and has native support for backing up to S3, point-in-time restore, and more. The major downside compared to, say, running pg_dump every night, is that it requires an identical setup to restore to – identical hardware and PostgreSQL version. Least privileges in the database is tricky – to be able to back up things like roles, the account basically needs full access to all schemas. The Barman documentation says that it should run as the same user as PostgresQL, postgres.

Step 1: Create an S3 bucket

This one’s pretty simple. Just follow the instructions on the Amazon website.

Step 2: Create an IAM Policy to grant access to the bucket

We use an IAM role to provide only the specific access that the service account needs. Go to the IAM console, select “Policies” on the left, and “Create new”. This is the template. Substitute <container_name> for your container name, obviously:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::<container_name>/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<container_name>"
            ]
        }
    ]
}

Step 3: Create a new S3 user, assign the policy and generate credentials

In the IAM console, select Users > Create user. Give them a unique name. Do NOT grant console access. Click Next. On the “Set permissions” page, select “Attach policies directly” and attach the policy you just created. It’s easier if you “Filter by Type” and select “Customer managed”. Select Next then Review and Create. Lets assume you’ve created a user called backup_user.

Once you’ve created backup_user, click on their name in the list and go to the “Security Credentials” tab. Click “Create Access Key” and then select “Other” from the list of options. We need a long-lived key, so this is the best approach (unless you want to go and re-authenticate them every month??). Create the access key and then copy and note down both the Access Key and Secret. Do this now or you won’t be able to access them again and you’ll need to regenerate them.

Step 4: Create a new check on healthchecks.io

I use healthchecks.io to keep track of all the scheduled tasks and processes i’m expecting to run. Log in and create a new health check. Note the URL.

Step 5: Install AWS CLI on the server

I found that this mostly went as expected. I followed the instructions on the AWS website, however, as i’d hardened my server using the Ubuntu CIS hardening baseline, i had to set some additional permissions:

sudo chmod -R 755 /usr/local/aws-cli

Step 6: Authenticate your new user with IAM credentials

Run aws configure. Enter the Access Key ID and Secret Access Key recorded in the step above. This generates a file at ~/.aws/credentials which contains these details. Later we’ll copy this to our postgres user’s home directory – but first we need to test our backup.

Step 7: Install prerequisites for `python-snappy` compression library

We’re going to use the snappy compression algorithm because of its significant performance improvements over the defaults while still achieving approximately a 2:1 compression ratio (saving on both egress and S3 storage costs). First, install the required library and pip:

sudo apt-get install libsnappy-dev python3-pip

Then we install the package – we’ll need to do this again for the postgres user later, as the package is installed to user packages, not site packages.

pip install python-snappy

Step 8: Download and install Barman

Barman is super easy to install. In my server setup, i added the PostgreSQL repos to my server – if you haven’t added the repo, follow the instructions there (which are slightly different to the ones on the PostgreSQL wiki), then simply install it – we’ll also install the Cloud CLI, allowing us to back up to S3:

sudo apt-get install barman barman-cli-cloud

Although documentation says we should configure Barman specifically for local backup by setting backup_method to local-rsync for our local server in a specific configuration file, we don’t actually need to do that – barman-backup-cloud is a standalone script that simply uses Barman. We can quickly test our backup. Note i’ve already set up a .pgpass file for the postgres_admin user:

rob@pg:~$ sudo -E barman-cloud-backup -v --cloud-provider aws-s3 --snappy --host localhost -U postgres_admin -d postgres s3://<container_name>/barman pg
2023-11-07 23:01:40,171 [1139430] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-07 23:01:40,749 [1139430] INFO: Starting backup '20231107T230140'
2023-11-07 23:01:41,408 [1139430] INFO: Uploading 'pgdata' directory '/mnt/postgres/postgresql/15/main' as 'data.tar.snappy'
2023-11-07 23:01:51,430 [1139436] INFO: Upload process started (worker 1)
2023-11-07 23:01:51,428 [1139435] INFO: Upload process started (worker 0)
2023-11-07 23:01:51,533 [1139436] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-07 23:01:51,534 [1139435] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-07 23:01:51,680 [1139435] INFO: Uploading 'barman/pg/base/20231107T230140/data.tar.snappy', part '1' (worker 0)
2023-11-07 23:01:58,138 [1139436] INFO: Uploading 'barman/pg/base/20231107T230140/data.tar.snappy', part '2' (worker 1)
...
2023-11-07 23:12:38,601 [1139436] INFO: Uploading 'barman/pg/base/20231107T230140/data.tar.snappy', part '278' (worker 1)
2023-11-07 23:12:41,232 [1139430] INFO: Uploading 'pg_control' file from '/mnt/postgres/postgresql/15/main/global/pg_control' to 'data.tar.snappy' with path 'global/pg_control'
2023-11-07 23:12:41,248 [1139430] INFO: Uploading 'config_file' file from '/etc/postgresql/15/main/postgresql.conf' to 'data.tar.snappy' with path 'postgresql.conf'
2023-11-07 23:12:41,249 [1139430] INFO: Uploading 'hba_file' file from '/etc/postgresql/15/main/pg_hba.conf' to 'data.tar.snappy' with path 'pg_hba.conf'
2023-11-07 23:12:41,249 [1139430] INFO: Uploading 'ident_file' file from '/etc/postgresql/15/main/pg_ident.conf' to 'data.tar.snappy' with path 'pg_ident.conf'
2023-11-07 23:12:41,250 [1139430] INFO: Stopping backup '20231107T230140'
2023-11-07 23:12:41,545 [1139430] INFO: Restore point 'barman_20231107T230140' successfully created
2023-11-07 23:12:41,546 [1139430] INFO: Uploading 'backup_label' file to 'data.tar.snappy' with path 'backup_label'
2023-11-07 23:12:41,546 [1139430] INFO: Marking all the uploaded archives as 'completed'
2023-11-07 23:12:41,547 [1139435] INFO: Uploading 'barman/pg/base/20231107T230140/data.tar.snappy', part '279' (worker 0)
2023-11-07 23:12:41,745 [1139436] INFO: Completing 'barman/pg/base/20231107T230140/data.tar.snappy' (worker 1)
2023-11-07 23:12:41,880 [1139430] INFO: Calculating backup statistics
2023-11-07 23:12:41,886 [1139430] INFO: Uploading 'barman/pg/base/20231107T230140/backup.info'
2023-11-07 23:12:42,016 [1139430] INFO: Backup end at LSN: 52/B91715B0 (0000000100000052000000B9, 001715B0)
2023-11-07 23:12:42,017 [1139430] INFO: Backup completed (start time: 2023-11-07 23:01:40.749792, elapsed time: 11 minutes, 1 second)
2023-11-07 23:12:42,021 [1139435] INFO: Upload process stopped (worker 0)
2023-11-07 23:12:42,022 [1139436] INFO: Upload process stopped (worker 1)

Step 9: Share AWS credentials with `postgres` user

Barman and barman-cloud-backup both require read access to the PostgreSQL storage. So we need to run our backup job as the postgres user. To make this work, we’ll copy our AWS credentials to them:

sudo mkdir ~postgres/.aws
sudo cp ~/.aws/credentials ~postgres/.aws/credentials
sudo chmod 0600 ~postgres/.aws/credentials
sudo chown -R postgres: ~postgres/.aws

Step 10: Install `python-snappy` as user `postgres`

We quickly need to log in and install the python-snappy package for the postgres user. First, log in as them:

sudo -i -u postgres

If you get this error on logging in as the user:
rob@pg:~$ sudo -i -u postgres sudo: unable to change directory to /var/lib/postgresql: No such file or directory
you’ll need to create the user’s home directory. First, log out of the postgres user, then check the home directory:
rob@pg:~$ getent passwd barman postgres:x:116:122:PostgreSQL administrator,,,:/var/lib/postgresql:/bin/bash
then create it:
sudo mkdir -p /var/lib/postgresql sudo chown postgres:postgres /var/lib/postgresql

Then once logged in as them, install the package:

pip install python-snappy

Step 11: Create backup service

We want the schedule to run every day, so we’ll create three systemd files. The first two are a backup script and service, the third a timer to trigger it. Firstly, we’ll check the home directory of the postgres user:

postgres@pg:~$ getent passwd barman
barman:x:118:123:Backup and Recovery Manager for PostgreSQL,,,:/var/lib/barman:/bin/bash

We can see that this it’s /var/lib/barman. If yours is different adjust these scripts. We need to use the absolute path because they won’t be expanded when running as a service. Create this file with sudo nano ~postgres/backup-script.sh, obviously substituting your S3 bucket, Healthchecks.io UUID and retention policy. We’re using peer authentication to allow the postgres user to sign in without a password:

#!/bin/bash

# Variables
BACKUP_DIR="/var/lib/postgresql/backup"
DATE_SUFFIX=$(date +%F_%H-%M-%S)
LOG_FILE="$BACKUP_DIR/barman_backup_log_$DATE_SUFFIX.txt"
S3_BUCKET="s3://<container_name>/barman"
HEALTHCHECK_URL="https://hc-ping.com/<UUID>"
SERVER_NAME="pg"
RETENTION_POLICY="RECOVERY WINDOW OF 30 DAYS"  # Adjust the retention policy as needed
RETAIN_LOG_DAYS=7

# create backup temp dir if it doesnt exist
mkdir -p $BACKUP_DIR

# Redirect all output to log file
exec > "$LOG_FILE" 2>&1

# Function to send log to healthchecks.io
send_log() {
    local url="$1"
    curl -fsS --retry 3 -m 10 -X POST -H "Content-Type: text/plain" --data-binary "@$LOG_FILE" "$url"
}

# Perform backup with Barman
barman-cloud-backup -v --cloud-provider aws-s3 --snappy -d postgres --port 1234 "$S3_BUCKET" "$SERVER_NAME" || {
    send_log "$HEALTHCHECK_URL/fail"
    exit 1
}

# Delete old backups according to retention policy
barman-cloud-backup-delete --cloud-provider aws-s3 --retention-policy "$RETENTION_POLICY" "$S3_BUCKET" "$SERVER_NAME" || {
    send_log "$HEALTHCHECK_URL/fail"
    exit 1
}

# Notify healthchecks.io of success and send log
send_log "$HEALTHCHECK_URL"

# Finally, delete old log files in BACKUP_DIR
find "$BACKUP_DIR" -type f -name 'barman_backup_log_*.txt' -mtime +$RETAIN_LOG_DAYS -exec rm -f {} \;

Make sure that the postgres user owns it and it’s executable:

sudo chown -R postgres: ~postgres/backup-script.sh
sudo chmod +x ~postgres/backup-script.sh

Create this file as /etc/systemd/system/barman-cloud-backup.service:

[Unit]
Description=Barman Cloud Backup Service

[Service]
Type=oneshot
ExecStart=/var/lib/postgresql/backup-script.sh
User=postgres

Test the timer with sudo systemctl start barman-cloud-backup. You can check the status using systemctl too – although you’ll need to do it from a second terminal as the service is non-forking. Here we can see that the service is running:

rob@pg:~$ sudo systemctl status barman-cloud-backup
● barman-cloud-backup.service - Barman Cloud Backup Service
     Loaded: loaded (/etc/systemd/system/barman-cloud-backup.service; static)
     Active: activating (start) since Tue 2023-11-07 23:34:26 UTC; 16s ago
   Main PID: 1143268 (backup-script.s)
      Tasks: 2 (limit: 2220)
     Memory: 50.7M
        CPU: 10.325s
     CGroup: /system.slice/barman-cloud-backup.service
             ├─1143268 /bin/bash /var/lib/postgresql/backup-script.sh
             └─1143271 /usr/bin/python3 /usr/bin/barman-cloud-backup -v --cloud-provider aws-s3 --snappy -d postgres --port 1234 s3://<container_name>/barman pg

Nov 07 23:34:26 pg systemd[1]: Starting Barman Cloud Backup Service...

We can check the log file too:

rob@pg:~$ sudo ls -ltr ~postgres/backup
total 36
-rw-r--r-- 1 postgres postgres 36296 Nov  7 23:48 barman_backup_log_2023-11-07_23-34-25.txt
rob@pg:~$ sudo cat ~postgres/backup/barman_backup_log_2023-11-07_23-34-25.txt
2023-11-07 23:34:26,705 [1143271] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-07 23:34:27,263 [1143271] INFO: Starting backup '20231107T233427'
2023-11-07 23:34:33,420 [1143271] INFO: Uploading 'pgdata' directory '/mnt/postgres/postgresql/15/main' as 'data.tar.gz'
2023-11-07 23:35:04,095 [1143316] INFO: Upload process started (worker 1)
2023-11-07 23:35:04,097 [1143315] INFO: Upload process started (worker 0)
2023-11-07 23:35:04,213 [1143316] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-07 23:35:04,220 [1143315] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-07 23:35:04,352 [1143316] INFO: Uploading 'barman/pg/base/20231107T233427/data.tar.gz', part '1' (worker 1)
2023-11-07 23:35:35,917 [1143315] INFO: Uploading 'barman/pg/base/20231107T233427/data.tar.gz', part '2' (worker 0)
2023-11-07 23:36:03,578 [1143316] INFO: Uploading 'barman/pg/base/20231107T233427/data.tar.gz', part '3' (worker 1)
...

Eventually, the backup will complete and we can check it in Healthchecks.io. We can also use barman-cloud-backup-list to list the backups:

rob@pg:~$ barman-cloud-backup-list s3://<container>/barman pg
Backup ID           End Time                 Begin Wal                     Archival Status  Name                
20231023T132628     2023-10-23 13:33:25      000000010000004E00000060                                           
20231103T130531     2023-11-03 13:22:11      000000010000005200000081                                           
20231103T135700     2023-11-03 14:11:59      000000010000005200000083                                           
20231107T211340     2023-11-07 21:28:10      0000000100000052000000B1                                           
20231107T230140     2023-11-07 23:12:41      0000000100000052000000B9                                           
20231107T231341     2023-11-07 23:22:42      0000000100000052000000BB                                           
20231107T234029     2023-11-07 23:48:10      0000000100000052000000C1

Step 12: Configure barman-cloud-backup to run on a schedule

Create the timer as /etc/systemd/system/barman-cloud-backup.timer

[Unit]
Description=Run Barman Cloud Backup every 6 hours

[Timer]
OnCalendar=*-*-* 00/6:00:00
Persistent=true

[Install]
WantedBy=timers.target

Install the timer with:

sudo systemctl enable barman-cloud-backup.timer
sudo systemctl start barman-cloud-backup.timer

Step 13: Configure PostgreSQL to use `barman-wal-cloud-archive` to archive WAL files to S3

Barman uses WAL archives for a restore. By configuring PostgreSQL to ship WAL archives directly to S3, we can achieve almost no loss of data on failure. We’ll do this by setting the archive_command and archive_mode configuration item in /etc/postgresql/15/main/postgresql.conf to the following values:

archive_mode = on
archive_command = 'barman-cloud-wal-archive --snappy s3://<container_name>/barman pg %p'

archive_mode tells PostgreSQL to process completed archive files with the archive_command. That means when the archive file completes, it is uploaded to S3

Step 14: Verify your backup works

I’ve written a whole article on how to restore from a barman backup – or you could check out the barman-cloud-restore documentation.

And that’s it! Check healthchecks.io for exceptions, check your S3 storage costs, and periodically test a restore!