Just what i find

Fix: “0 functions loaded” deploying Azure Functions from package

This post is mainly a reminder to myself, because i’ve made the same mistake a few times in different projects.

How and why to deploy Azure functions from a package

When using Azure Functions, you can simplify deployment and maintenance of your function by deploying from a package file rather than directly in to the function. This also has the benefit of reducing cold start times, particularly where there are a large number of dependencies.

To deploy from a package (without using the Azure deployment tools), you:

Create a zip file with your code and its dependencies
deploy that zip to Azure
Set the WEBSITE_RUN_FROM_PACKAGE property on the function app settings

The WEBSITE_RUN_FROM_PACKAGE setting can either be a 1 if you’ve deployed your code to the /home/data/SitePackages folder on the function app, or a URL. I prefer to deploy my code as a zip stored in a blob, as this seems cleaner, and is easier to upgrade.

Create a container (app-container)
Upload the file to a blob called something like app-blob-datetime. By appending a timestamp to the blob name, subsequent deployments can be to a new blob (avoiding locking concerns) and then switching over is nearly instant and has almost zero downtime. If desired, it is simple to switch back to a previous version.
Generate a long-lived SAS with read privileges scoped on the container (app-container)
Construct a fully qualified URL to the blob, and set this as the WEBSITE_RUN_FROM_PACKAGE setting.
Restart the function app (typically < 1 second).

Error: `0 functions loaded`

When browsing to the functions Overview page, we can get invited to “Create functions in your preferred environment”, implying that no functions exist:

functions overview page, including text "Create functions in your preferred environment"

Navigating to Monitoring > Logs, and selecting Traces shows something interesting (you may need to enable App Insights and restart the function app). First, we see a message 1 functions found (Custom), but immediately after it 0 functions loaded:

two log entries, 1 functions found (Custom) immediately after it 0 functions loaded

Fix: Ensure all packages are available

In order for the custom package to load, you need to ensure that all of the Python packages that it needs are available. To do this, you must install the packages in to the folder and then zip it. You can do this with Pip, setting the specific output folder e.g.

pip install --disable-pip-version-check --target="build/.python_packages/lib/site-packages" -r app/requirements.txt

Then, when zip the file to include the .python_packages folder. Here’s my build script, which also ensures that we don’t include any files specified in .funcignore:

#!/bin/bash

# Remove and recreate the 'build' folder
rm -rf build
mkdir -p build

# export requirements.txt
poetry export --only main --without-hashes -f requirements.txt --output app/requirements.txt

# Copy the contents of the 'app' folder to 'build', including hidden files and folders
shopt -s dotglob
cp -r app/* build/

# Apply .funcignore to the contents of 'build'
if [ -f "app/.funcignore" ]; then
  while IFS= read -r pattern; do
    find build -name "$pattern" -exec rm -rf {} +
  done < app/.funcignore
fi

# https://github.com/Azure/azure-functions-host/issues/9720#issuecomment-2129618480
pip install --disable-pip-version-check --target="build/.python_packages/lib/site-packages" -r app/requirements.txt

Fix: Pulumi error Cannot modify this site because another operation is in progress creating Azure Function on Dynamic Consumption plan

Another post which is more to remind me than anything else. When creating an Azure Function in Pulumi, you may get the following error:

Cannot modify this site because another operation is in progress

After a bit of digging, i found this issue on the Pulumi repo, which points to a page on the Azure Functions wiki where they say:

If you run in a dedicated mode, you need to turn on the Always On setting for your Function App to run properly … When running in a Consumption Plan or Premium Plan you should not enable Always On.

The Always On setting for Python is actually part of the constructor of pulumi_azure_native.web.SiteConfigArgs:

(parameter) always_on: Input[bool] | None
always_on: true if Always On is enabled; otherwise, false.

Setting this to true solved the problem.

Fix: WordPress error “The response is not a valid JSON response when” uploading images

When uploading images to WordPress, you may get this error. There are plenty of blogs online offering solutions, but they only apply to self-hosted instances – mine is hosted on just-what-i-find.onyx-sites.io/.

The error is a little pop up with the text The response is not a valid JSON response at the bottom of the screen when you try and upload an image:

Looking in the developer tools console on the browser shows one of two error messages:

Failed to load resource: the server responded with a status of 403 () for URL /wp-json/wp/v2/media

POST https://atomic-temporary-181991729.wpcomstaging.com/wp-json/wp/v2/media?_locale=user 403 (Forbidden)

I have Cloudflare in front of my blog, with the OWASP filter set enabled. By examining the Security Events log (in Cloudflare at Security > Events), and adding a filter for the path /wp-json/wp/v2/media:

screenshot showing that the `path` filter is mapped to the URI

i was able to see that WAF was triggering on a specific rule, 949110: Inbound Anomaly Score Exceeded. There are lots of posts on the Cloudflare forum about this. One answer from the Cloudflare team points out that the OWASP ruleset is not managed by Cloudflare – they simply integrate it in their WAF, so they have no way to tweak it. They do, however, point out you can bypass it. So I created a custom rule to match (http.request.uri.path eq "/wp-json/wp/v2/media"):

screenshot of the configuration settings as described above

I then selected to “Skip specific rules from a Managed Ruleset”, and disable rule 949110: Inbound Anomaly Score Exceeded for this specific URI:

screenshot showing the rule is being skipped

I apply the ruleset before the OWASP one in the priority list:

screen shot of managed rules in order: cloudflare managed ruleset, skip 949110 for wp media upload, cloudflare owasp core ruleset, cloudflare leaked credentials check. all are enabled.

And now, no more errors. Of course, this will reduce the security protection of your WordPress instance – at least for this URI. See the Cloudflare documentation for more details.

Use Barman to back up PostgreSQL on an Azure VM to Azure Blob storage

In a previous post, I created a Barman backup script to back up PostgreSQL running in an VM to AWS S3. If you host your PostgreSQL server in Azure, this can get expensive quickly because you pay egress bandwidth fees to Microsoft. In this article, i’ll show you how to use Azure Blob storage instead.

Step 1: Install Barman, barman-cli-cloud, snappy etc.

see original article.

Step 2: Install Azure CLI instead of AWS S3 CLI

Follow the instructions on the Microsoft website.

Step 3: Install `azure-storage-blob` Python package

sudo -u postgres /bin/bashpip install azure-storage-blob

Step 4: Create an Azure storage account and a container

Follow the Microsoft documentation to create a new storage account in the same region as your VM (to avoid inter-region data transfer fees) and then create a container in it. I used standard storage, enabled only private networks (i.e. to connect from my VM via a private end point), and disabled soft deletes. I then created a container called backup. Note that disabling public networks will prevent you browsing the container – you can edit this later via the GUI.

Be sure to create your storage account in the same availability zone as your VM. Microsoft is introducing inter-AZ bandwidth charges for VMs in the future, and so it’s inevitable that it is introduced for other services too.

Step 5: Assign a managed identity to the VM and grant it access to the container

Here, we’re basically following the instructions from the Microsoft website.

First, enable a system-managed identity for the VM: go to the VM in Azure, and under “Security” find the “Identity” panel. Under “System Assigned”, turn the status to “on”.

Go to the Azure Storage Account you created above. Navigate to the storage account itself (not the container in it), and then to “Access Control (IAM)”. Choose the option to add a Role Assignment. For the Role, select Storage Blob Data Contributor. On the “Members” tab search for Managed Identity of the VM. Save the assignment.

Step 6: Test that the credentials work by running a manual backup

We need to do this as the postgres user, so that it has access to the PostgreSQL database files. First, we’ll log in as the postgres user with sudo -u postgres /bin/bash. then run a manual backup:

postgres@pg:~$ barman-cloud-backup -v --cloud-provider azure-blob-storage --azure-credential=managed-identity --snappy -d postgres "azure://yourcontainername.blob.core.windows.net/backup" "server_name"
2023-12-10 12:15:04,993 [327423] INFO: Authenticating to Azure with shared key
2023-12-10 12:15:05,068 [327423] INFO: Request URL: 'https://yourcontainername.blob.core.windows.net/backup?restype=REDACTED&comp=REDACTED'
Request method: 'GET'
Request headers:
    'x-ms-version': 'REDACTED'
    'Accept': 'application/xml'
    'User-Agent': 'azsdk-python-storage-blob/12.19.0 Python/3.10.12 (Linux-6.2.0-1018-azure-x86_64-with-glibc2.35)'
    'x-ms-date': 'REDACTED'
    'x-ms-client-request-id': 'c0613fee-9755-11ee-a617-979c6065e555'
    'Authorization': 'REDACTED'
No body was attached to the request
2023-12-10 12:15:05,138 [327423] INFO: Response status: 200
Response headers:
    'Transfer-Encoding': 'chunked'
    'Content-Type': 'application/xml'
    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
    'x-ms-request-id': 'c33dbf1e-e01e-004d-0a62-2b4fc2000000'
    'x-ms-client-request-id': 'c0613fee-9755-11ee-a617-979c6065e555'
    'x-ms-version': 'REDACTED'
    'Date': 'Sun, 10 Dec 2023 12:15:08 GMT'
2023-12-10 12:15:05,151 [327423] INFO: Starting backup '20231210T121505'
2023-12-10 12:15:05,194 [327423] INFO: Uploading 'pgdata' directory '/mnt/postgres/postgresql/15/main' as 'data.tar.snappy'
2023-12-10 12:15:05,522 [327428] INFO: Upload process started (worker 0)
2023-12-10 12:15:05,523 [327428] INFO: Authenticating to Azure with shared key
2023-12-10 12:15:05,541 [327429] INFO: Upload process started (worker 1)
2023-12-10 12:15:05,542 [327429] INFO: Authenticating to Azure with shared key
2023-12-10 12:15:05,545 [327428] INFO: Uploading 'PG/base/20231210T121505/data.tar.snappy', part '1' (worker 0)
...
2023-12-10 12:19:11,568 [327423] INFO: Backup end at LSN: 63/9A000138 (00000002000000630000009A, 00000138)
2023-12-10 12:19:11,568 [327423] INFO: Backup completed (start time: 2023-12-10 12:15:05.151810, elapsed time: 4 minutes, 6 seconds)
2023-12-10 12:19:11,569 [327429] INFO: Upload process stopped (worker 1)
2023-12-10 12:19:11,569 [327428] INFO: Upload process stopped (worker 0)

From this we can see that the backup was successful, two workers ran and it took just over 4 minutes. We’re now ready to create our script. Create it as ~postgres/backup-script.sh:

#!/bin/bash

# Variables
BACKUP_DIR="/var/lib/postgresql/backup"
DATE_SUFFIX=$(date +%F_%H-%M-%S)
LOG_FILE="$BACKUP_DIR/barman_backup_log_$DATE_SUFFIX.txt"
AZURE_CONTAINER="azure://yourcontainername.blob.core.windows.net/backup"  # Replace with your Azure Blob Storage container URL
HEALTHCHECK_URL="https://hc-ping.com/<slug>"
SERVER_NAME="pg"
RETENTION_POLICY="RECOVERY WINDOW OF 30 DAYS"  # Adjust the retention policy as needed
RETAIN_LOG_DAYS=7

# create backup temp dir if it doesnt exist
mkdir -p $BACKUP_DIR

# Redirect all output to log file
exec > "$LOG_FILE" 2>&1

# Function to send log to healthchecks.io
send_log() {
    local url="$1"
    curl -fsS --retry 3 -m 10 -X POST -H "Content-Type: text/plain" --data-binary "@$LOG_FILE" "$url"
}

# Perform backup with Barman
# dont use verbose (-v) as output will be too long for healthchecks.io
barman-cloud-backup --cloud-provider=azure-blob-storage --azure-credential=managed-identity --snappy -p 31432 -d postgres  "$AZURE_CONTAINER" "$SERVER_NAME" || {
    send_log "$HEALTHCHECK_URL/fail"
    exit 1
}

# Delete old backups according to retention policy
barman-cloud-backup-delete --cloud-provider=azure-blob-storage --azure-credential=managed-identity --retention-policy "$RETENTION_POLICY" "$AZURE_CONTAINER" "$SERVER_NAME" || {
    send_log "$HEALTHCHECK_URL/fail"
    exit 1
}

# Notify healthchecks.io of success and send log
send_log "$HEALTHCHECK_URL"

# Finally, delete old log files in BACKUP_DIR
find "$BACKUP_DIR" -type f -name 'barman_backup_log_*.txt' -mtime +$RETAIN_LOG_DAYS -exec rm -f {} \;

Remember to make it executable with chmod +x backup-script.sh. Create the $BACKUP directory (in this script, it’s /var/lib/postgresql/backup because /var/lib/postgresql is the home directory of the postgres user). While logged in as postgres use mkdir -p ~postgres/backup

Step 7: Create a backup service

We now need to create our backup service. Use this content as /etc/systemd/system/barman-cloud-backup.service:

[Unit]
Description=Barman Cloud Backup Service

[Service]
Type=oneshot
ExecStart=/var/lib/postgresql/backup-script.sh
User=postgres

There’s no need to ‘enable’ the service because it has no ‘installation config’ i.e. it’s not triggered by another service – we’ll use a timer later to trigger it. But for now, lets test it:

sudo systemctl start barman-cloud-backup

You’ll see nothing for a while, then the command prompt will return. We can check the logs at $BACKUP_DIR (configured in the script). At the end, we can use systemctl to check for success:

rob@pg:~$ sudo systemctl status barman-cloud-backup.service
○ barman-cloud-backup.service - Barman Cloud Backup Service
     Loaded: loaded (/etc/systemd/system/barman-cloud-backup.service; static)
     Active: inactive (dead) since Sun 2023-12-10 12:57:01 UTC; 42s ago
TriggeredBy: ○ barman-cloud-backup.timer
    Process: 330631 ExecStart=/var/lib/postgresql/backup-script.sh (code=exited, status=0/SUCCESS)
   Main PID: 330631 (code=exited, status=0/SUCCESS)
        CPU: 2min 41.835s

Dec 10 12:52:52 pg systemd[1]: Starting Barman Cloud Backup Service...
Dec 10 12:57:01 pg systemd[1]: barman-cloud-backup.service: Deactivated successfully.
Dec 10 12:57:01 pg systemd[1]: Finished Barman Cloud Backup Service.
Dec 10 12:57:01 pg systemd[1]: barman-cloud-backup.service: Consumed 2min 41.835s CPU time.

Finally, we can check on healthchecks.io to see that the ping was successful. We can also use barman-cloud-backup-list to see:

postgres@pg:~$ barman-cloud-backup-list --cloud-provider=azure-blob-storage --azure-credential=managed-identity azure://containername.blob.core.windows.net/backup pg
Backup ID           End Time                 Begin Wal                     Archival Status  Name                
20231210T124733     2023-12-10 12:51:39      00000002000000630000009F                                           
20231210T125252     2023-12-10 12:56:58      0000000200000063000000A2

Step 8: Create a timer

Refer to the original article.

Step 9: configure WAL to Azure

See the main article for most of the settings e.g. archive_mode, wal_level, archive_timeout. We’ll edit archive_command in /etc/postgresql/15/main/postgresql.conf to call this file:

archive_command = 'barman-cloud-wal-archive --snappy --cloud-provider=azure-blob-storage --azure-credential=managed-identity azure://pgcynexianetbackup.blob.core.windows.net/backup pg %p'

Now restart PostgreSQL:

sudo systemctl restart postgresql@15-main.service

If we browse to the container in the storage account, we can see that it has a folder for the server, then a base folder containing timestamped backups, and a wals folder containing the WAL files.

Step 10: test your backup

As always with a backup – test that it works by restoring – check out my guide to restoring a Barman backup from S3 – it’s basically the same, and you can adjust the parameters with the setup and steps above.

How to restore backups of PostgreSQL from S3 using Barman and barman-cloud-backup

In my previous post, I showed how to automate backups of a PostgreSQL database to S3 using Barman and barman-cloud-backup.

Step 1: verify hardware architecture and PostgreSQL version

For a successful restore, Barman requires that the hardware architecture and PostgreSQL version are both identical. You can verify these with some simple terminal commands:

rob@pg:~$ uname -m # report architecture
x86_64
rob@pg:~$ psql --version # PostgreSQL version
psql (PostgreSQL) 15.5 (Ubuntu 15.5-1.pgdg22.04+1)

Run these on both the source and target. Obviously, if you dont have the source any more (which is why you’re restoring), you’ll need to make some assumptions…

Step 2: Install Barman, barman-cli-cloud, AWS CLI, python-snappy

Follow the instructions in steps 5-10 of in my original article. When you’ve completed the restore, you can pick up from steps 11 onwards to automate backups again.

If you no longer have the credentials for your IAM user, log in to the console and generate a secondary key pair. Once you’ve completed the restore, you can swap them and retire the old keys.

Step 3: Verify S3 connectivity and identify the latest backup

We can use the barman-backup-list command to see a list of our backups on S3. This will also verify connectivity.

rob@pg:~/.aws$ barman-cloud-backup-list s3://<bucket>/barman pg
Backup ID           End Time                 Begin Wal                     Archival Status  Name                
20231023T132628     2023-10-23 13:33:25      000000010000004E00000060                                           
20231103T130531     2023-11-03 13:22:11      000000010000005200000081                                           
20231103T135700     2023-11-03 14:11:59      000000010000005200000083                                           
20231107T211340     2023-11-07 21:28:10      0000000100000052000000B1                                           
20231107T230140     2023-11-07 23:12:41      0000000100000052000000B9                                           
20231107T231341     2023-11-07 23:22:42      0000000100000052000000BB                                           
20231107T234029     2023-11-07 23:48:10      0000000100000052000000C1                                           
20231108T060002     2023-11-08 06:12:46      0000000100000052000000C5                                           
20231108T120002     2023-11-08 12:32:50      0000000100000052000000C9                                           
20231108T180002     2023-11-08 18:13:02      0000000100000052000000CD                                           
20231109T000002     2023-11-09 00:26:54      0000000100000052000000D1                                           
20231109T060002     2023-11-09 06:12:19      0000000100000052000000D5                                           
20231109T120002     2023-11-09 12:11:19      0000000100000052000000D9                                           
20231109T180002     2023-11-09 18:12:55      0000000100000052000000DD                                           
20231110T000002     2023-11-10 00:25:49      0000000100000052000000E1                                           
20231110T060001     2023-11-10 06:18:26      0000000100000052000000E5                                           
20231110T120002     2023-11-10 12:15:52      0000000100000052000000E9                                           
20231110T180002     2023-11-10 18:12:24      0000000100000052000000EE                                           
20231111T000002     2023-11-11 00:27:31      0000000100000052000000F0                                           
20231111T060002     2023-11-11 06:13:08      0000000100000052000000F2                                           
20231111T120002     2023-11-11 12:10:25      0000000100000052000000F5

Eventually we’ll restore 20231111T120002.

Step 4: Restore the data

First, check the data_directory setting on the new server to see where to restore to. In my server setup, i move this to another drive

rob@new_server:~$ sudo -u postgres psql -c "show data_directory;"
could not change directory to "/home/rob": Permission denied
          data_directory          
----------------------------------
 /mnt/postgres/postgresql/15/main
(1 row)

Stop PostgreSQL then rename the folder:

rob@pg:~$ sudo systemctl stop postgresql
rob@pg:~$ sudo systemctl status postgresql
○ postgresql.service - PostgreSQL RDBMS
     Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Sat 2023-11-11 12:18:06 UTC; 4s ago
   Main PID: 850 (code=exited, status=0/SUCCESS)
        CPU: 3ms

Nov 10 18:11:06 pg systemd[1]: Starting PostgreSQL RDBMS...
Nov 10 18:11:06 pg systemd[1]: Finished PostgreSQL RDBMS.
Nov 11 12:18:06 pg systemd[1]: postgresql.service: Deactivated successfully.
Nov 11 12:18:06 pg systemd[1]: Stopped PostgreSQL RDBMS.
rob@pg:~$ sudo mv /mnt/postgres/postgresql/15/main /mnt/postgres/postgresql/15/main_old

We’re now ready to execute the restore procedure. We do this as the postgres user so that we avoid permission issues. Switch to them with the sudo -i -u postgres command. Then run the restore command. 20231111T120002 is the timestamp corresponding to the backup we want to restore:

postgres@new_server:~$ barman-cloud-restore --verbose s3://<bucket>/barman pg 20231111T120002 /mnt/postgres/postgresql/15/main
postgres@pg:~$ barman-cloud-restore --verbose s3://pg.cynexia.net-backup/barman pg 20231111T120002 /mnt/postgres/postgresql/15/main
2023-11-11 12:24:48,875 [218010] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-11 12:24:49,380 [218010] INFO: Found file from backup '20231111T120002' of server 'pg': barman/pg/base/20231111T120002/data.tar.snappy

Once that completes we can inspect the data folder and see it contains the right data:

postgres@new_server:~$ ls /mnt/postgres/postgresql/15/main
PG_VERSION    base    pg_commit_ts  pg_hba.conf    pg_logical    pg_notify    pg_serial     pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
backup_label  global  pg_dynshmem   pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    pg_wal       postgresql.auto.conf

We now need to copy the config files postgresql.conf, pg_hba.conf and pg_ident.conf to the PostgreSQL installation directory. First, we’ll rename the existing files, then copy the restored ones. Do this as the postgres user:

postgres@new_server:~$ mv /etc/postgresql/15/main/postgresql.conf /etc/postgresql/15/main/postgresql.conf.backup
postgres@new_server:~$ cp /mnt/postgres/postgresql/15/main/postgresql.conf /etc/postgresql/15/main/
postgres@new_server:~$ mv /etc/postgresql/15/main/pg_hba.conf /etc/postgresql/15/main/pg_hba.conf.backup
postgres@new_server:~$ cp /mnt/postgres/postgresql/15/main/pg_hba.conf /etc/postgresql/15/main/
postgres@new_server:~$ mv /etc/postgresql/15/main/pg_ident.conf /etc/postgresql/15/main/pg_ident.conf.backup
postgres@new_server:~$ cp /mnt/postgres/postgresql/15/main/pg_ident.conf /etc/postgresql/15/main/

We need to create the recovery signal to tell PostgreSQL to initialise a recovery:

touch /mnt/postgres/postgresql/15/main/recovery.signal

Step 5: Configure WAL recovery

First, we need to disable the archive_command to stop the server from overwriting the bucket contents. First, find the archive_mode setting and set it to off:

sudo sed -i '/^#archive_mode/c\archive_mode = on' /etc/postgresql/*/main/postgresql.conf
sudo sed -i 's/^archive_mode\s*=.*/archive_mode = off/' /etc/postgresql/*/main/postgresql.conf

Just to be sure, we comment out archive_command:

sudo sed -i 's/^\(archive_command\s*=.*\)/#\1/' /etc/postgresql/*/main/postgresql.conf

We then need to set the restore_command and recovery_target_time in /etc/postgresql/15/main/postgresql.conf. We’ll use the barman-cloud-wal-restore command:

sudo sed -i '/^#restore_command/c\restore_command = '"'"'barman-cloud-wal-restore s3://pg.cynexia.net-backup/barman pg %f %p'"'" /etc/postgresql/*/main/postgresql.conf
sudo sed -i '/^#recovery_target_timeline/c\recovery_target_timeline = '"'"'latest'"'" /etc/postgresql/*/main/postgresql.conf

Step 6: Start the server

start the server:

sudo systemctl start postgresql

check the logs with sudo tail -n 100 /var/log/postgresql/postgresql-15-main.log. Fix any configuration errors by referring to your backup configuration files.

If you get an error like this:
2023-11-11 17:48:15.594 UTC {[local]} [260249] postgres@template1 FATAL: database locale is incompatible with operating system
2023-11-11 17:48:15.594 UTC {[local]} [260249] postgres@template1 DETAIL: The database was initialized with LC_COLLATE “en_US.UTF-8”, which is not recognized by setlocale().
2023-11-11 17:48:15.594 UTC {[local]} [260249] postgres@template1 HINT: Recreate the database with another locale or install the missing locale.
You can fix this by installing the correct locale:
sudo locale-gen en_US.UTF-8 sudo update-locale

You can check that the restore was successful by running a query against the database, and checking that the /mnt/postgres/postgresql/15/main/recovery.signal file is gone.

Step 7: set up scheduled backups and WAL archiving on the new server

Follow steps 11-13 on my ‘set up backup’ post. This will re-enable WAL archiving to S3, and create a full backup on a schedule.

How to automate backups of a PostgreSQL server using Barman/barman-cloud-backup to S3

I was surprised not to find many up to date instructions on this. I have a few basic requirements:

Back up daily to an S3 bucket
Keep a certain number of backups
Run automatically, preferably using systemd not cron as it’s easier to set up and troubleshoot
Use a user with least privileges on the database, operating system, and in AWS/S3
Send the results of each backup activity to healthchecks.io

After a bit of playing around, I decided to use Barman for the backups – it’s significantly easier to configure and use than pgBackRest and has native support for backing up to S3, point-in-time restore, and more. The major downside compared to, say, running pg_dump every night, is that it requires an identical setup to restore to – identical hardware and PostgreSQL version. Least privileges in the database is tricky – to be able to back up things like roles, the account basically needs full access to all schemas. The Barman documentation says that it should run as the same user as PostgresQL, postgres.

Step 1: Create an S3 bucket

This one’s pretty simple. Just follow the instructions on the Amazon website.

Step 2: Create an IAM Policy to grant access to the bucket

We use an IAM role to provide only the specific access that the service account needs. Go to the IAM console, select “Policies” on the left, and “Create new”. This is the template. Substitute <container_name> for your container name, obviously:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::<container_name>/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<container_name>"
            ]
        }
    ]
}

Step 3: Create a new S3 user, assign the policy and generate credentials

In the IAM console, select Users > Create user. Give them a unique name. Do NOT grant console access. Click Next. On the “Set permissions” page, select “Attach policies directly” and attach the policy you just created. It’s easier if you “Filter by Type” and select “Customer managed”. Select Next then Review and Create. Lets assume you’ve created a user called backup_user.

Once you’ve created backup_user, click on their name in the list and go to the “Security Credentials” tab. Click “Create Access Key” and then select “Other” from the list of options. We need a long-lived key, so this is the best approach (unless you want to go and re-authenticate them every month??). Create the access key and then copy and note down both the Access Key and Secret. Do this now or you won’t be able to access them again and you’ll need to regenerate them.

Step 4: Create a new check on healthchecks.io

I use healthchecks.io to keep track of all the scheduled tasks and processes i’m expecting to run. Log in and create a new health check. Note the URL.

Step 5: Install AWS CLI on the server

I found that this mostly went as expected. I followed the instructions on the AWS website, however, as i’d hardened my server using the Ubuntu CIS hardening baseline, i had to set some additional permissions:

sudo chmod -R 755 /usr/local/aws-cli

Step 6: Authenticate your new user with IAM credentials

Run aws configure. Enter the Access Key ID and Secret Access Key recorded in the step above. This generates a file at ~/.aws/credentials which contains these details. Later we’ll copy this to our postgres user’s home directory – but first we need to test our backup.

Step 7: Install prerequisites for `python-snappy` compression library

We’re going to use the snappy compression algorithm because of its significant performance improvements over the defaults while still achieving approximately a 2:1 compression ratio (saving on both egress and S3 storage costs). First, install the required library and pip:

sudo apt-get install libsnappy-dev python3-pip

Then we install the package – we’ll need to do this again for the postgres user later, as the package is installed to user packages, not site packages.

pip install python-snappy

Step 8: Download and install Barman

Barman is super easy to install. In my server setup, i added the PostgreSQL repos to my server – if you haven’t added the repo, follow the instructions there (which are slightly different to the ones on the PostgreSQL wiki), then simply install it – we’ll also install the Cloud CLI, allowing us to back up to S3:

sudo apt-get install barman barman-cli-cloud

Although documentation says we should configure Barman specifically for local backup by setting backup_method to local-rsync for our local server in a specific configuration file, we don’t actually need to do that – barman-backup-cloud is a standalone script that simply uses Barman. We can quickly test our backup. Note i’ve already set up a .pgpass file for the postgres_admin user:

rob@pg:~$ sudo -E barman-cloud-backup -v --cloud-provider aws-s3 --snappy --host localhost -U postgres_admin -d postgres s3://<container_name>/barman pg
2023-11-07 23:01:40,171 [1139430] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-07 23:01:40,749 [1139430] INFO: Starting backup '20231107T230140'
2023-11-07 23:01:41,408 [1139430] INFO: Uploading 'pgdata' directory '/mnt/postgres/postgresql/15/main' as 'data.tar.snappy'
2023-11-07 23:01:51,430 [1139436] INFO: Upload process started (worker 1)
2023-11-07 23:01:51,428 [1139435] INFO: Upload process started (worker 0)
2023-11-07 23:01:51,533 [1139436] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-07 23:01:51,534 [1139435] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-07 23:01:51,680 [1139435] INFO: Uploading 'barman/pg/base/20231107T230140/data.tar.snappy', part '1' (worker 0)
2023-11-07 23:01:58,138 [1139436] INFO: Uploading 'barman/pg/base/20231107T230140/data.tar.snappy', part '2' (worker 1)
...
2023-11-07 23:12:38,601 [1139436] INFO: Uploading 'barman/pg/base/20231107T230140/data.tar.snappy', part '278' (worker 1)
2023-11-07 23:12:41,232 [1139430] INFO: Uploading 'pg_control' file from '/mnt/postgres/postgresql/15/main/global/pg_control' to 'data.tar.snappy' with path 'global/pg_control'
2023-11-07 23:12:41,248 [1139430] INFO: Uploading 'config_file' file from '/etc/postgresql/15/main/postgresql.conf' to 'data.tar.snappy' with path 'postgresql.conf'
2023-11-07 23:12:41,249 [1139430] INFO: Uploading 'hba_file' file from '/etc/postgresql/15/main/pg_hba.conf' to 'data.tar.snappy' with path 'pg_hba.conf'
2023-11-07 23:12:41,249 [1139430] INFO: Uploading 'ident_file' file from '/etc/postgresql/15/main/pg_ident.conf' to 'data.tar.snappy' with path 'pg_ident.conf'
2023-11-07 23:12:41,250 [1139430] INFO: Stopping backup '20231107T230140'
2023-11-07 23:12:41,545 [1139430] INFO: Restore point 'barman_20231107T230140' successfully created
2023-11-07 23:12:41,546 [1139430] INFO: Uploading 'backup_label' file to 'data.tar.snappy' with path 'backup_label'
2023-11-07 23:12:41,546 [1139430] INFO: Marking all the uploaded archives as 'completed'
2023-11-07 23:12:41,547 [1139435] INFO: Uploading 'barman/pg/base/20231107T230140/data.tar.snappy', part '279' (worker 0)
2023-11-07 23:12:41,745 [1139436] INFO: Completing 'barman/pg/base/20231107T230140/data.tar.snappy' (worker 1)
2023-11-07 23:12:41,880 [1139430] INFO: Calculating backup statistics
2023-11-07 23:12:41,886 [1139430] INFO: Uploading 'barman/pg/base/20231107T230140/backup.info'
2023-11-07 23:12:42,016 [1139430] INFO: Backup end at LSN: 52/B91715B0 (0000000100000052000000B9, 001715B0)
2023-11-07 23:12:42,017 [1139430] INFO: Backup completed (start time: 2023-11-07 23:01:40.749792, elapsed time: 11 minutes, 1 second)
2023-11-07 23:12:42,021 [1139435] INFO: Upload process stopped (worker 0)
2023-11-07 23:12:42,022 [1139436] INFO: Upload process stopped (worker 1)

Step 9: Share AWS credentials with `postgres` user

Barman and barman-cloud-backup both require read access to the PostgreSQL storage. So we need to run our backup job as the postgres user. To make this work, we’ll copy our AWS credentials to them:

sudo mkdir ~postgres/.aws
sudo cp ~/.aws/credentials ~postgres/.aws/credentials
sudo chmod 0600 ~postgres/.aws/credentials
sudo chown -R postgres: ~postgres/.aws

Step 10: Install `python-snappy` as user `postgres`

We quickly need to log in and install the python-snappy package for the postgres user. First, log in as them:

sudo -i -u postgres

If you get this error on logging in as the user:
rob@pg:~$ sudo -i -u postgres sudo: unable to change directory to /var/lib/postgresql: No such file or directory
you’ll need to create the user’s home directory. First, log out of the postgres user, then check the home directory:
rob@pg:~$ getent passwd barman postgres:x:116:122:PostgreSQL administrator,,,:/var/lib/postgresql:/bin/bash
then create it:
sudo mkdir -p /var/lib/postgresql sudo chown postgres:postgres /var/lib/postgresql

Then once logged in as them, install the package:

pip install python-snappy

Step 11: Create backup service

We want the schedule to run every day, so we’ll create three systemd files. The first two are a backup script and service, the third a timer to trigger it. Firstly, we’ll check the home directory of the postgres user:

postgres@pg:~$ getent passwd barman
barman:x:118:123:Backup and Recovery Manager for PostgreSQL,,,:/var/lib/barman:/bin/bash

We can see that this it’s /var/lib/barman. If yours is different adjust these scripts. We need to use the absolute path because they won’t be expanded when running as a service. Create this file with sudo nano ~postgres/backup-script.sh, obviously substituting your S3 bucket, Healthchecks.io UUID and retention policy. We’re using peer authentication to allow the postgres user to sign in without a password:

#!/bin/bash

# Variables
BACKUP_DIR="/var/lib/postgresql/backup"
DATE_SUFFIX=$(date +%F_%H-%M-%S)
LOG_FILE="$BACKUP_DIR/barman_backup_log_$DATE_SUFFIX.txt"
S3_BUCKET="s3://<container_name>/barman"
HEALTHCHECK_URL="https://hc-ping.com/<UUID>"
SERVER_NAME="pg"
RETENTION_POLICY="RECOVERY WINDOW OF 30 DAYS"  # Adjust the retention policy as needed
RETAIN_LOG_DAYS=7

# create backup temp dir if it doesnt exist
mkdir -p $BACKUP_DIR

# Redirect all output to log file
exec > "$LOG_FILE" 2>&1

# Function to send log to healthchecks.io
send_log() {
    local url="$1"
    curl -fsS --retry 3 -m 10 -X POST -H "Content-Type: text/plain" --data-binary "@$LOG_FILE" "$url"
}

# Perform backup with Barman
barman-cloud-backup -v --cloud-provider aws-s3 --snappy -d postgres --port 1234 "$S3_BUCKET" "$SERVER_NAME" || {
    send_log "$HEALTHCHECK_URL/fail"
    exit 1
}

# Delete old backups according to retention policy
barman-cloud-backup-delete --cloud-provider aws-s3 --retention-policy "$RETENTION_POLICY" "$S3_BUCKET" "$SERVER_NAME" || {
    send_log "$HEALTHCHECK_URL/fail"
    exit 1
}

# Notify healthchecks.io of success and send log
send_log "$HEALTHCHECK_URL"

# Finally, delete old log files in BACKUP_DIR
find "$BACKUP_DIR" -type f -name 'barman_backup_log_*.txt' -mtime +$RETAIN_LOG_DAYS -exec rm -f {} \;

Make sure that the postgres user owns it and it’s executable:

sudo chown -R postgres: ~postgres/backup-script.sh
sudo chmod +x ~postgres/backup-script.sh

Create this file as /etc/systemd/system/barman-cloud-backup.service:

[Unit]
Description=Barman Cloud Backup Service

[Service]
Type=oneshot
ExecStart=/var/lib/postgresql/backup-script.sh
User=postgres

Test the timer with sudo systemctl start barman-cloud-backup. You can check the status using systemctl too – although you’ll need to do it from a second terminal as the service is non-forking. Here we can see that the service is running:

rob@pg:~$ sudo systemctl status barman-cloud-backup
● barman-cloud-backup.service - Barman Cloud Backup Service
     Loaded: loaded (/etc/systemd/system/barman-cloud-backup.service; static)
     Active: activating (start) since Tue 2023-11-07 23:34:26 UTC; 16s ago
   Main PID: 1143268 (backup-script.s)
      Tasks: 2 (limit: 2220)
     Memory: 50.7M
        CPU: 10.325s
     CGroup: /system.slice/barman-cloud-backup.service
             ├─1143268 /bin/bash /var/lib/postgresql/backup-script.sh
             └─1143271 /usr/bin/python3 /usr/bin/barman-cloud-backup -v --cloud-provider aws-s3 --snappy -d postgres --port 1234 s3://<container_name>/barman pg

Nov 07 23:34:26 pg systemd[1]: Starting Barman Cloud Backup Service...

We can check the log file too:

rob@pg:~$ sudo ls -ltr ~postgres/backup
total 36
-rw-r--r-- 1 postgres postgres 36296 Nov  7 23:48 barman_backup_log_2023-11-07_23-34-25.txt
rob@pg:~$ sudo cat ~postgres/backup/barman_backup_log_2023-11-07_23-34-25.txt
2023-11-07 23:34:26,705 [1143271] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-07 23:34:27,263 [1143271] INFO: Starting backup '20231107T233427'
2023-11-07 23:34:33,420 [1143271] INFO: Uploading 'pgdata' directory '/mnt/postgres/postgresql/15/main' as 'data.tar.gz'
2023-11-07 23:35:04,095 [1143316] INFO: Upload process started (worker 1)
2023-11-07 23:35:04,097 [1143315] INFO: Upload process started (worker 0)
2023-11-07 23:35:04,213 [1143316] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-07 23:35:04,220 [1143315] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-07 23:35:04,352 [1143316] INFO: Uploading 'barman/pg/base/20231107T233427/data.tar.gz', part '1' (worker 1)
2023-11-07 23:35:35,917 [1143315] INFO: Uploading 'barman/pg/base/20231107T233427/data.tar.gz', part '2' (worker 0)
2023-11-07 23:36:03,578 [1143316] INFO: Uploading 'barman/pg/base/20231107T233427/data.tar.gz', part '3' (worker 1)
...

Eventually, the backup will complete and we can check it in Healthchecks.io. We can also use barman-cloud-backup-list to list the backups:

rob@pg:~$ barman-cloud-backup-list s3://<container>/barman pg
Backup ID           End Time                 Begin Wal                     Archival Status  Name                
20231023T132628     2023-10-23 13:33:25      000000010000004E00000060                                           
20231103T130531     2023-11-03 13:22:11      000000010000005200000081                                           
20231103T135700     2023-11-03 14:11:59      000000010000005200000083                                           
20231107T211340     2023-11-07 21:28:10      0000000100000052000000B1                                           
20231107T230140     2023-11-07 23:12:41      0000000100000052000000B9                                           
20231107T231341     2023-11-07 23:22:42      0000000100000052000000BB                                           
20231107T234029     2023-11-07 23:48:10      0000000100000052000000C1

Step 12: Configure barman-cloud-backup to run on a schedule

Create the timer as /etc/systemd/system/barman-cloud-backup.timer

[Unit]
Description=Run Barman Cloud Backup every 6 hours

[Timer]
OnCalendar=*-*-* 00/6:00:00
Persistent=true

[Install]
WantedBy=timers.target

Install the timer with:

sudo systemctl enable barman-cloud-backup.timer
sudo systemctl start barman-cloud-backup.timer

Step 13: Configure PostgreSQL to use `barman-wal-cloud-archive` to archive WAL files to S3

Barman uses WAL archives for a restore. By configuring PostgreSQL to ship WAL archives directly to S3, we can achieve almost no loss of data on failure. We’ll do this by setting the archive_command and archive_mode configuration item in /etc/postgresql/15/main/postgresql.conf to the following values:

archive_mode = on
archive_command = 'barman-cloud-wal-archive --snappy s3://<container_name>/barman pg %p'

archive_mode tells PostgreSQL to process completed archive files with the archive_command. That means when the archive file completes, it is uploaded to S3

Step 14: Verify your backup works

I’ve written a whole article on how to restore from a barman backup – or you could check out the barman-cloud-restore documentation.

And that’s it! Check healthchecks.io for exceptions, check your S3 storage costs, and periodically test a restore!

Migrate your TimescaleDB from timescale.com to self hosted

I’ve been using timescale.com for about a year, but it’s quite expensive for a hobbyist (their cheapest plan in the UK is about $50/month), so I thought i’d try and implement something cheaper. I know i won’t get the cool automatic DR failover or the sexy continuous backups – but it’s not really mission critical if i lose a small amount of the history of my home energy consumption. Timescale publish some instructions, but they weren’t complete, and didnt work for me.

Step 1 – install base OS

I use linode. I chose a Linode 2GB which comes in at around $12/month for 1 shared CPU and 2GB ram. This is about double what i had on timescale, which was 0.5 CPU. I added a 50GB storage account, which woudl allow me to rebuild my linode without worrying about losing my database data, bringing my total to $17/month – but this isnt necessary.

I installed Ubuntu 22.04 LTS, and set up default automatic updates, disable SSH password auth etc. using my standard setup scripts. I decided to use ZFS as the file system for the data drive to take advantage of the native error checking capabilities. It does have some performance implications, but for a single drive with a relatively small database, i think this is an acceptable tradeoff, especially when compared to the significant performance hit of BTFRS. I called the additional volume postgres, then mounted this at /mnt/postgres

First, install ZFS, create ZFS pool,

sudo apt-get install zfsutils-linux
sudo zpool create postgres_storage /dev/disk/by-id/scsi-0Linode_Volume_postgres

If the volume was previously used for something else, ZFS will warn you:
/dev/disk/by-id/scsi-0Linode_Volume_postgres contains a filesystem of type 'ext4'
You’ll need to erase the existing file system before creating the new ZFS pool. Obviously this is destructive and you’ll lose all the data on the drive:
sudo wipefs --all --force /dev/disk/by-id/scsi-0Linode_Volume_postgres

Create a new ZFS dataset and mount it:

sudo zfs create postgres_storage/data
sudo zfs set mountpoint=/mnt/postgres postgres_storage/data

There’s no need to modify /etc/fstab as ZFS will automatically manage mounting of datasets within a pool at boot time.

Step 2: install PostgreSQL and move data to other drive (and change the port)

Next, i had to install PostgreSQL and TimescaleDB. First, install PostgreSQL itself. Timescale’s latest supported version is 15. I had to change the instructions slightly from the PostgreSQL website because they use the legacy keystore.

# crate repository info
sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'

# Import the repository signing key - note this instruction is different than the PostgreSQL website.
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/postgres.gpg

# fix permissions on the new key
sudo chmod 644 /etc/apt/trusted.gpg.d/*.gpg

# Update the package lists:
sudo apt-get update

# Install version 15 PostgreSQL:
sudo apt-get -y install postgresql-15

ok great! Now lets move the database storage to our new ZFS volume. First, i need to stop the service:

sudo systemctl stop postgresql

then open the config file to edit:

sudo nano /etc/postgresql/15/main/postgresql.conf

I also want to change the port that PostgreSQL is on. So find the port line and edit it:

port = 15433

and move the data by updating the data_directory property:

data_directory = '/mnt/postgres/postgresql/15/main'

Finally, move the database, update permissions and restart PostgreSQL:

sudo cp -ra /var/lib/postgresql /mnt/postgres/
sudo chown -R postgres:postgres /mnt/postgres/
sudo chmod -R 700 /mnt/postgres/
sudo systemctl start postgresql

Verify the new location:

rob@server:~$ sudo -u postgres psql -c "show data_directory;"
          data_directory          
----------------------------------
 /mnt/postgres/postgresql/15/main
(1 row)

If there are any problems, check the logs with sudo cat /var/log/postgresql/postgresql-15-main.log

Delete the old storage so that we don’t get confused:

sudo rm -r /var/lib/postgresql/15/main

To make it easy to remember the port (i.e. so that i dont need to set it every time), i set it as an export:

echo "export PGPORT=15433" >> ~/.bashrc 
source ~/.bashrc

Step 3: Install TimescaleDB

First, log in to the new server, then add the TimescaleDB repo:

echo "deb https://packagecloud.io/timescale/timescaledb/$(lsb_release -i -s | tr '[:upper:]' '[:lower:]')/ $(lsb_release -c -s) main" | sudo tee /etc/apt/sources.list.d/timescaledb.list

#fix permissions
sudo chmod 644 /etc/apt/sources.list.d/*.list

Then add the TimescaleDB GPG key:

wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/timescaledb.gpg

# fix permissions
sudo chmod 644 /etc/apt/trusted.gpg.d/*.gpg

Update and install:

sudo apt update
sudo apt install timescaledb-2-postgresql-15

The extension needs to be preloaded, so edit the config file and add it. Assuming there are no other preloaded extensions, just run this:

echo "shared_preload_libraries = 'timescaledb'" >> /etc/postgresql/15/main/postgresql.conf

To finish off, run timescaledb-tune to set up the extension:

sudo timescaledb-tune

Increase max_locks_per_transaction to 1024 in config file /etc/postgresql/*/main/postgresql.conf. First, uncomment it, if commented out:

sudo sed -i.bak -E '/^[[:space:]]*#?[[:space:]]*max_locks_per_transaction[[:space:]]*=[[:space:]]*[0-9]+/s/^([[:space:]]*)#?/\1/' /etc/postgresql/*/main/postgresql.conf

Then set the value to 1024:

sudo sed -i.bak -E '/^[[:space:]]*max_locks_per_transaction[[:space:]]*=[[:space:]]*[0-9]+/s/[0-9]+/1024/' /etc/postgresql/*/main/postgresql.conf

restart PostgreSQL with sudo systemctl restart postgresql

If you have any other extensions installed, be sure to install them now. For example, i also have PostGIS installed, so i followed the instructions on the PostGIS website – it uses the same repo as PostgreSQL so i just needed to install the package with sudo apt-get install postgresql-15-postgis-3 and i was done.

Step 4: back up your existing DB

This part is pretty easy. Stop any processes which are using the database so you dont lose data. Then run this command, substituting your own details for -h, -p and –U. I believe tsdb is the default database name on timescale.com but check yours. Do this from the new server – the slowest part of the process is the restore, so we want the files locally. That also lets you redo the restore if it fails without incurring ingress costs.

pg_dump -Fd -j 1 -h c99aarsjk6.b099as2kb2.tsdb.cloud.timescale.com -p 15433 -U tsdbadmin -d tsdb -f _dump

I found that trying to run more than one job in parallel just failed with memory errors. I ended up with about 500GB of data for a ~2,500GB database.

Step 5: Create roles in the database

This was more tricky. You have to run a few more magic steps on the new server.

Timescale.com doesn’t let you dump roles because roles are cluster-wide. But if you’re migrating from your own server, you can do it with the command pg_dumpall --roles-only -U postgres_admin -h localhost -f roles.sql
You’ll need to filter roles.sql to only include the roles you’re interested in – but it’s much easier.

We need to database and roles i needed as pg_restore doesn’t create roles, only assigns privileges. First, start psql and connect to my hosted timescale.com database:

psql -h c99aarsjk6.b099as2kb2.tsdb.cloud.timescale.com -p 15433 -U tsdbadmin -d tsdb

You can find your roles with the \du command e.g.

sdb=> \du
   Role name      |            Attributes             |    Member of    
------------------+-----------------------------------+------------------
 conditionseditor | Cannot login                      | {}               
 functionrestapi  |                                   | {conditionseditor}
 functionwriter   |                                   | {conditionseditor}
 postgres         | Superuser, Create role, Create DB,| {}               
                  | Replication, Bypass RLS           |                  
 postgres_admin   | Superuser, Create DB              | {}               
 tsdbadmin        |                                   | {}               
 tsdbexplorer     | Cannot login                      | {}

You’ll need to create these roles and add them together. Naturally, you should use long, complex passwords. To connect to the new database, create a postgres_admin user with a password. We do this by connecting as the postgres user:

sudo -u postgres psql

then create the user with a password:

CREATE ROLE postgres_admin;
ALTER ROLE postgres_admin WITH SUPERUSER INHERIT NOCREATEROLE CREATEDB LOGIN NOREPLICATION NOBYPASSRLS PASSWORD 'supersecretpassword';

Quit with \q.

Create a .pgpass file to store the password for login and set permissions on it to secure it. The docs say you can use * for hostname, port, database or username.

echo "localhost:*:*:postgres_admin:supersecretpassword" | tee ~/.pgpass
chmod 600 ~/.pgpass

We can then connect to the database using psql and our new postgres_admin user. You need to use -h so that you can specify -U.

psql -h localhost -U postgres_admin postgres

then create the roles:

CREATE ROLE conditionseditor WITH NOLOGIN;
CREATE ROLE functionrestapi WITH LOGIN PASSWORD '<pw_here>';
CREATE ROLE functionwriter WITH LOGIN PASSWORD '<pw_here>';
GRANT conditionseditor TO functionrestapi;
GRANT conditionseditor TO functionwriter;
CREATE ROLE tsdbadmin WITH LOGIN PASSWORD '<pw_here>';
CREATE ROLE tsdbexplorer NOLOGIN;

Don’t forget the admin users (tsdb… etc.).

Next, create your database, switch to it, and activate the timescaledb_pre_restore which does some magic. Inside psql:

CREATE DATABASE tsdb;
\c tsdb
CREATE EXTENSION IF NOT EXISTS timescaledb;
SELECT timescaledb_pre_restore();

then quit with \q.

Be sure to create the database with the same extension version as your current database server. Check the version with \dx timescaledb then use CREATE EXTENSION IF NOT EXISTS timescaledb VERSION 'xxx'; where xxx is the version in your current environment.

Step 6: actually restore the data

This step takes ages. I found that i need to avoid parallelism or i received referential integrity errors. Here’s the command which worked – obviously from the new server, as thats where i backed up to before. I do this using tmux in case my SSH connection drops:

pg_restore -Fd -j1 -h localhost -U postgres_admin -d tsdb -S postgres_admin --disable-triggers --exit-on-error --verbose _dump

You need the --disable-triggers and if you have that, you also need the -S flag to specify the superuser to run the disable commands as. We add --exit-on-error so that we dont lose track of things. When finished, log in to psql and run the post-restore script and update statistics using ANALYZE – this last step took a while too:

SELECT timescaledb_post_restore();
ANALYZE;

Step 7: Allow external network connections

By default, PostgreSQL doesn’t allow external connections. We can test that by using our connection string from another machine e.g.:

 ~  psql "postgres://functionwriter:<password>@server.example.com:15433/tsdb?sslmode=require"
psql: error: connection to server at "server.example.com" (2606:2800:220:1:248:1893:25c8:1946), port 15433 failed: Connection refused
	Is the server running on that host and accepting TCP/IP connections?
connection to server at "server.example.com" (1.2.3.4), port 15433 failed: Connection refused
	Is the server running on that host and accepting TCP/IP connections?

To do this, on the server, we need to adjust postgresql.conf to listen on our external addresses, and update pg_hba.conf to accept authentication requests from them too.

Set listen_addresses to '*'. First, uncomment the line listen_addresses:

sudo sed -i.bak -E '/^[[:space:]]*#?[[:space:]]*listen_addresses[[:space:]]*=[[:space:]]*/s/^#//' /etc/postgresql/*/main/postgresql.conf

Then, set value to '*'

sudo sed -i.bak -E "/^[[:space:]]*listen_addresses[[:space:]]*=[[:space:]]*/s/=.*/= '*'/" /etc/postgresql/*/main/postgresql.conf

Next, allow scram-sha-256 authentication from 0.0.0.0/0 for IPv4 and ::/0 for IPv6 we add them explicitly to the config file:

echo -e "# Allow all IP addresses with scram-sha-256 authentication\n# IPv4\nhost    all             all             0.0.0.0/0               scram-sha-256\n# IPv6\nhost    all             all             ::/0                    scram-sha-256" | sudo tee -a /etc/postgresql/*/main/pg_hba.conf

Then restart PostgreSQL:

sudo systemctl restart postgresql

I can test the connection from a different machine:

 ~  psql "postgres://functionwriter:<password>@server.example.com:15433/tsdb?sslmode=require"                                                                            ok 
psql (16.0, server 15.4 (Ubuntu 15.4-2.pgdg22.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

tsdb=>

thats pretty much it! Now i just need to update the connection string in my apps, and i’m good to go.

Next steps

I should probably sort a few other things out:

backups – doesnt come for free any more. I need to figure something out, probably using pg_dump again.
Install fail2ban with a custom filter for postgresql
Automated updates – typically, i’d like new minor versions of TimescaleDB to be installed. Can i automate the psql command?
Automated updates – how to manage major version upgrades?
Updates to PostgreSQL – e.g. PostgreSQL 16 just came out.
of course i set up a firewall on linode, but can i do more?

The Multiplier Effect: How The Aggregation of Marginal Delays Derails Projects

The Aggregation of Marginal Gains is an improvement model attributed to Dave Brailsford. When he took over as the head of British Cycling in 2002, the team was near the bottom of the rankings. Brailsford’s approach was to look at their processes in tiny detail, and improve each part by 1%, the logic being that these would all add up to a significant improvement. He talks about shipping mattresses between hotels so that athletes get a better nights’ sleep, or fastidiously cleaning the team truck to prevent dust and debris undermining the fine tuning of the bikes. And it was a success – in the 2008 and 2012 olympics, the team won 70% of the gold medals for track cycling.

I want to propose a contrasting notion – the Aggregation of Marginal Delays – the slow accumulation of tiny lags and delays in a project that add up to a significant slip in delivery performance. These delays are often so small and (at the time) inconsequential that team members just brush them off. Perhaps you need to get approval from three people with busy calendars – it might take you a few days to get in their diary. Frustrating – but we’re all busy, right? Maybe you need to request something from another which takes half a day to released to you. Annoying – but the other team’s process is clearly laid out on their website – didn’t you read it? That person you needed to get advice from has taken the afternoon off to watch their kids nativity play. Who’d begrudge them that?

But these delays, each one small and explainable, add up, both quantitatively, and culturally. None of them is worth escalating – by the time you get this in front of someone who could change it, the delay is in the past. But the cumulative impact of a few hours here, half a day there, across dozens of events, across months of work, is significant. And it sets the tone for how things are done – we, as an organisation, start to feel that it’s an acceptable state of affairs – like I said, all of the causes of the delays are reasonable, none are malicious, or the result of incompetence. And because of that, when there are delays which could be avoided, they’re often not.

I’m afraid i don’t have a silver bullet here. We’ve tried lots of things to make the impact of these delays visible, but none have worked. In most cases, the cure is worse than the disease, creating massive overhead. Here’s what we’ve tried:

Immediate escalation – but this made people nervous, and often there was nothing to be done – are we really going to summon that team member back from their child’s school play to answer our question?
Flagging potential bottlenecks up front – although it did help somewhat to remind team members to consider when they might have to book things in advance, too much forward planning is somewhat wasteful, and hinders the agility of the team.
Pushing accountability down to teams – we already do this as much as we can. But the culture in our part of the organisation, which wants to move fast, is at odds with central IT services providing IT to the wider company, which needs to deliver a secure and reliable service which meets everyone’s needs. So we can’t make all the decisions.
Capture data on areas with consistent delays – we tried to use this to systematically improve the processes in those areas, but often the teams which own them weren’t interested in change. They felt our proposed ‘improvements’ would introduce too much risk. And it was quite burdensome to track every request too.

So what can you do? In short, treat it like any other Continuous Improvement exercise – gather data, plan, execute, review.

Introduce a system to track delays – partial data is better than none.
Hold other teams to their SLAs, and use your data to demonstrate this.
Run a “Delay Spotlight” in your team meetings, where team members can raise frustrating delays, and teams can brainstorm improvements. Focus pilots on areas you can control, rather than trying to change central teams.

The Aggregation of Marginal Delays is an inherent challenge in large , complex organisations, but one that we can chip away at with through collective analysis, data, communication and a mindset of continuous improvement. Remember that Marginal Gains accumulate too.

FIX: The key(s) in the keyring /etc/apt/trusted.gpg.d/???.gpg are ignored as the file is not readable by user ‘_apt’ executing apt-key

When running apt-get update i was seeing these errors:

W: http://security.ubuntu.com/ubuntu/dists/jammy-security/InRelease: The key(s) in the keyring /etc/apt/trusted.gpg.d/postgres.gpg are ignored as the file is not readable by user '_apt' executing apt-key.

I was getting this error after migrating keys from the old, legacy store to a the shiny new one.

A quick inspection shows that the new keys have different permissions to the existing ones

rob@localhost:~$ ls -alh /etc/apt/trusted.gpg.d/
total 40K
drwxr-xr-x 2 root root 4.0K Oct 13 14:00 .
drwxr-xr-x 8 root root 4.0K Oct 13 13:54 ..
-rw-r--r-- 1 root root 1.2K Sep  6 05:46 akamai-ubuntu-launchpad-ubuntu-ppa.gpg
-rw-r----- 1 root root 3.5K Oct 13 14:00 postgres.gpg
-rw-r----- 1 root root 2.8K Oct 13 14:00 timescaledb.gpg
-rw-r--r-- 1 root root 2.8K Mar 26  2021 ubuntu-keyring-2012-cdimage.gpg
-rw-r--r-- 1 root root 1.7K Mar 26  2021 ubuntu-keyring-2018-archive.gpg
-rw-r--r-- 1 root root 2.3K Oct 13 08:59 ubuntu-pro-cis.gpg
-rw-r--r-- 1 root root 2.2K Oct 13 08:57 ubuntu-pro-esm-apps.gpg
-rw-r--r-- 1 root root 2.3K Oct 13 08:57 ubuntu-pro-esm-infra.gpg

The fix is pretty simple. Pick one of the pre-existing GPG keys, and copy the permissions to all the other keys in the folder. In my case, i chose the ubuntu-pro-cis.gpg key, but you can pick any that doesnt report the permissions error. Pass it as a --reference argument to the chmod command:

rob@localhost:~$ sudo chmod --reference=/etc/apt/trusted.gpg.d/ubuntu-pro-cis.gpg /etc/apt/trusted.gpg.d/*.gpg
rob@localhost:~$ ls -alh /etc/apt/trusted.gpg.d/
total 40K
drwxr-xr-x 2 root root 4.0K Oct 13 14:00 .
drwxr-xr-x 8 root root 4.0K Oct 13 13:54 ..
-rw-r--r-- 1 root root 1.2K Sep  6 05:46 akamai-ubuntu-launchpad-ubuntu-ppa.gpg
-rw-r--r-- 1 root root 3.5K Oct 13 14:00 postgres.gpg
-rw-r--r-- 1 root root 2.8K Oct 13 14:00 timescaledb.gpg
-rw-r--r-- 1 root root 2.8K Mar 26  2021 ubuntu-keyring-2012-cdimage.gpg
-rw-r--r-- 1 root root 1.7K Mar 26  2021 ubuntu-keyring-2018-archive.gpg
-rw-r--r-- 1 root root 2.3K Oct 13 08:59 ubuntu-pro-cis.gpg
-rw-r--r-- 1 root root 2.2K Oct 13 08:57 ubuntu-pro-esm-apps.gpg
-rw-r--r-- 1 root root 2.3K Oct 13 08:57 ubuntu-pro-esm-infra.gpg

Problem solved!

FIX: Key is stored in legacy trusted.gpg keyring

While running apt-get update I was seeing errors:

W: https://packagecloud.io/timescale/timescaledb/ubuntu/dists/jammy/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.

Although the warning is annoying, it doesn’t stop things updating. I understand the reasons why the legacy keyring is being removed.

Migrate existing keys to the new keyring

First, list the keys:

sudo apt-key list

In my case, i’ve got two – one for PostgreSQL and one for timescaledb. You will probably see a bunch of extra keys here too.

rob@localhost:~$ sudo apt-key list
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
/etc/apt/trusted.gpg
--------------------
pub   rsa4096 2011-10-13 [SC]
      B97B 0AFC AA1A 47F0 44F2  44A0 7FCC 7D46 ACCC 4CF8
uid           [ unknown] PostgreSQL Debian Repository

pub   rsa4096 2018-10-19 [SCEA]
      1005 FB68 604C E9B8 F687  9CF7 59F1 8EDF 47F2 4417
uid           [ unknown] https://packagecloud.io/timescale/timescaledb (https://packagecloud.io/docs#gpg_signing) <support@packagecloud.io>
sub   rsa4096 2018-10-19 [SEA]

Export the key by copying the last 8 characters of the identifier. Because I have two keys to export, i did this twice, giving each key a unique filename under /etc/apt/trusted.gpg.d/:

rob@localhost:~$ sudo apt-key export ACCC4CF8 | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/postgres.gpg
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
rob@localhost:~$ sudo apt-key export 47F24417 | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/timescaledb.gpg
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).

Bingo – package updates now work! But if they don’t you might get this error:

W: http://security.ubuntu.com/ubuntu/dists/jammy-security/InRelease: The key(s) in the keyring /etc/apt/trusted.gpg.d/postgres.gpg are ignored as the file is not readable by user '_apt' executing apt-key.

In which case, check out this other post explaining how to solve it.