Use Barman to back up PostgreSQL on an Azure VM to Azure Blob storage

In a previous post, I created a Barman backup script to back up PostgreSQL running in an VM to AWS S3. If you host your PostgreSQL server in Azure, this can get expensive quickly because you pay egress bandwidth fees to Microsoft. In this article, i’ll show you how to use Azure Blob storage instead.

  1. Step 1: Install Barman, barman-cli-cloud, snappy etc.
  2. Step 2: Install Azure CLI instead of AWS S3 CLI
  3. Step 3: Install azure-storage-blob Python package
  4. Step 4: Create an Azure storage account and a container
  5. Step 5: Assign a managed identity to the VM and grant it access to the container
  6. Step 6: Test that the credentials work by running a manual backup
  7. Step 7: Create a backup service
  8. Step 8: Create a timer
  9. Step 9: configure WAL to Azure
  10. Step 10: test your backup

Step 1: Install Barman, barman-cli-cloud, snappy etc.

see original article.

Step 2: Install Azure CLI instead of AWS S3 CLI

Follow the instructions on the Microsoft website.

Step 3: Install azure-storage-blob Python package

Log in as the postgres user and use pip to install azure-blob-storage:

sudo -u postgres /bin/bashpip install azure-storage-blob

Step 4: Create an Azure storage account and a container

Follow the Microsoft documentation to create a new storage account in the same region as your VM (to avoid inter-region data transfer fees) and then create a container in it. I used standard storage, enabled only private networks (i.e. to connect from my VM via a private end point), and disabled soft deletes. I then created a container called backup. Note that disabling public networks will prevent you browsing the container – you can edit this later via the GUI.

Be sure to create your storage account in the same availability zone as your VM. Microsoft is introducing inter-AZ bandwidth charges for VMs in the future, and so it’s inevitable that it is introduced for other services too.

Step 5: Assign a managed identity to the VM and grant it access to the container

Here, we’re basically following the instructions from the Microsoft website.

First, enable a system-managed identity for the VM: go to the VM in Azure, and under “Security” find the “Identity” panel. Under “System Assigned”, turn the status to “on”.

Go to the Azure Storage Account you created above. Navigate to the storage account itself (not the container in it), and then to “Access Control (IAM)”. Choose the option to add a Role Assignment. For the Role, select Storage Blob Data Contributor. On the “Members” tab search for Managed Identity of the VM. Save the assignment.

Step 6: Test that the credentials work by running a manual backup

We need to do this as the postgres user, so that it has access to the PostgreSQL database files. First, we’ll log in as the postgres user with sudo -u postgres /bin/bash. then run a manual backup:

postgres@pg:~$ barman-cloud-backup -v --cloud-provider azure-blob-storage --azure-credential=managed-identity --snappy -d postgres "azure://yourcontainername.blob.core.windows.net/backup" "server_name"
2023-12-10 12:15:04,993 [327423] INFO: Authenticating to Azure with shared key
2023-12-10 12:15:05,068 [327423] INFO: Request URL: 'https://yourcontainername.blob.core.windows.net/backup?restype=REDACTED&comp=REDACTED'
Request method: 'GET'
Request headers:
'x-ms-version': 'REDACTED'
'Accept': 'application/xml'
'User-Agent': 'azsdk-python-storage-blob/12.19.0 Python/3.10.12 (Linux-6.2.0-1018-azure-x86_64-with-glibc2.35)'
'x-ms-date': 'REDACTED'
'x-ms-client-request-id': 'c0613fee-9755-11ee-a617-979c6065e555'
'Authorization': 'REDACTED'
No body was attached to the request
2023-12-10 12:15:05,138 [327423] INFO: Response status: 200
Response headers:
'Transfer-Encoding': 'chunked'
'Content-Type': 'application/xml'
'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
'x-ms-request-id': 'c33dbf1e-e01e-004d-0a62-2b4fc2000000'
'x-ms-client-request-id': 'c0613fee-9755-11ee-a617-979c6065e555'
'x-ms-version': 'REDACTED'
'Date': 'Sun, 10 Dec 2023 12:15:08 GMT'
2023-12-10 12:15:05,151 [327423] INFO: Starting backup '20231210T121505'
2023-12-10 12:15:05,194 [327423] INFO: Uploading 'pgdata' directory '/mnt/postgres/postgresql/15/main' as 'data.tar.snappy'
2023-12-10 12:15:05,522 [327428] INFO: Upload process started (worker 0)
2023-12-10 12:15:05,523 [327428] INFO: Authenticating to Azure with shared key
2023-12-10 12:15:05,541 [327429] INFO: Upload process started (worker 1)
2023-12-10 12:15:05,542 [327429] INFO: Authenticating to Azure with shared key
2023-12-10 12:15:05,545 [327428] INFO: Uploading 'PG/base/20231210T121505/data.tar.snappy', part '1' (worker 0)
...
2023-12-10 12:19:11,568 [327423] INFO: Backup end at LSN: 63/9A000138 (00000002000000630000009A, 00000138)
2023-12-10 12:19:11,568 [327423] INFO: Backup completed (start time: 2023-12-10 12:15:05.151810, elapsed time: 4 minutes, 6 seconds)
2023-12-10 12:19:11,569 [327429] INFO: Upload process stopped (worker 1)
2023-12-10 12:19:11,569 [327428] INFO: Upload process stopped (worker 0)

From this we can see that the backup was successful, two workers ran and it took just over 4 minutes. We’re now ready to create our script. Create it as ~postgres/backup-script.sh:

#!/bin/bash

# Variables
BACKUP_DIR="/var/lib/postgresql/backup"
DATE_SUFFIX=$(date +%F_%H-%M-%S)
LOG_FILE="$BACKUP_DIR/barman_backup_log_$DATE_SUFFIX.txt"
AZURE_CONTAINER="azure://yourcontainername.blob.core.windows.net/backup" # Replace with your Azure Blob Storage container URL
HEALTHCHECK_URL="https://hc-ping.com/<slug>"
SERVER_NAME="pg"
RETENTION_POLICY="RECOVERY WINDOW OF 30 DAYS" # Adjust the retention policy as needed
RETAIN_LOG_DAYS=7

# create backup temp dir if it doesnt exist
mkdir -p $BACKUP_DIR

# Redirect all output to log file
exec > "$LOG_FILE" 2>&1

# Function to send log to healthchecks.io
send_log() {
local url="$1"
curl -fsS --retry 3 -m 10 -X POST -H "Content-Type: text/plain" --data-binary "@$LOG_FILE" "$url"
}

# Perform backup with Barman
# dont use verbose (-v) as output will be too long for healthchecks.io
barman-cloud-backup --cloud-provider=azure-blob-storage --azure-credential=managed-identity --snappy -p 31432 -d postgres "$AZURE_CONTAINER" "$SERVER_NAME" || {
send_log "$HEALTHCHECK_URL/fail"
exit 1
}

# Delete old backups according to retention policy
barman-cloud-backup-delete --cloud-provider=azure-blob-storage --azure-credential=managed-identity --retention-policy "$RETENTION_POLICY" "$AZURE_CONTAINER" "$SERVER_NAME" || {
send_log "$HEALTHCHECK_URL/fail"
exit 1
}

# Notify healthchecks.io of success and send log
send_log "$HEALTHCHECK_URL"

# Finally, delete old log files in BACKUP_DIR
find "$BACKUP_DIR" -type f -name 'barman_backup_log_*.txt' -mtime +$RETAIN_LOG_DAYS -exec rm -f {} \;

Remember to make it executable with chmod +x backup-script.sh. Create the $BACKUP directory (in this script, it’s /var/lib/postgresql/backup because /var/lib/postgresql is the home directory of the postgres user). While logged in as postgres use mkdir -p ~postgres/backup

Step 7: Create a backup service

We now need to create our backup service. Use this content as /etc/systemd/system/barman-cloud-backup.service:

[Unit]
Description=Barman Cloud Backup Service

[Service]
Type=oneshot
ExecStart=/var/lib/postgresql/backup-script.sh
User=postgres

There’s no need to ‘enable’ the service because it has no ‘installation config’ i.e. it’s not triggered by another service – we’ll use a timer later to trigger it. But for now, lets test it:

sudo systemctl start barman-cloud-backup

You’ll see nothing for a while, then the command prompt will return. We can check the logs at $BACKUP_DIR (configured in the script). At the end, we can use systemctl to check for success:

rob@pg:~$ sudo systemctl status barman-cloud-backup.service
○ barman-cloud-backup.service - Barman Cloud Backup Service
Loaded: loaded (/etc/systemd/system/barman-cloud-backup.service; static)
Active: inactive (dead) since Sun 2023-12-10 12:57:01 UTC; 42s ago
TriggeredBy: ○ barman-cloud-backup.timer
Process: 330631 ExecStart=/var/lib/postgresql/backup-script.sh (code=exited, status=0/SUCCESS)
Main PID: 330631 (code=exited, status=0/SUCCESS)
CPU: 2min 41.835s

Dec 10 12:52:52 pg systemd[1]: Starting Barman Cloud Backup Service...
Dec 10 12:57:01 pg systemd[1]: barman-cloud-backup.service: Deactivated successfully.
Dec 10 12:57:01 pg systemd[1]: Finished Barman Cloud Backup Service.
Dec 10 12:57:01 pg systemd[1]: barman-cloud-backup.service: Consumed 2min 41.835s CPU time.

Finally, we can check on healthchecks.io to see that the ping was successful. We can also use barman-cloud-backup-list to see:

postgres@pg:~$ barman-cloud-backup-list --cloud-provider=azure-blob-storage --azure-credential=managed-identity azure://containername.blob.core.windows.net/backup pg
Backup ID End Time Begin Wal Archival Status Name
20231210T124733 2023-12-10 12:51:39 00000002000000630000009F
20231210T125252 2023-12-10 12:56:58 0000000200000063000000A2

Step 8: Create a timer

Refer to the original article.

Step 9: configure WAL to Azure

See the main article for most of the settings e.g. archive_mode, wal_level, archive_timeout. We’ll edit archive_command in /etc/postgresql/15/main/postgresql.conf to call this file:

archive_command = 'barman-cloud-wal-archive --snappy --cloud-provider=azure-blob-storage --azure-credential=managed-identity azure://pgcynexianetbackup.blob.core.windows.net/backup pg %p'

Now restart PostgreSQL:

sudo systemctl restart postgresql@15-main.service

If we browse to the container in the storage account, we can see that it has a folder for the server, then a base folder containing timestamped backups, and a wals folder containing the WAL files.

Step 10: test your backup

As always with a backup – test that it works by restoring – check out my guide to restoring a Barman backup from S3 – it’s basically the same, and you can adjust the parameters with the setup and steps above.

How to restore backups of PostgreSQL from S3 using Barman and barman-cloud-backup

In my previous post, I showed how to automate backups of a PostgreSQL database to S3 using Barman and barman-cloud-backup.

  1. Step 1: verify hardware architecture and PostgreSQL version
  2. Step 2: Install Barman, barman-cli-cloud, AWS CLI, python-snappy
  3. Step 3: Verify S3 connectivity and identify the latest backup
  4. Step 4: Restore the data
  5. Step 5: Configure WAL recovery
  6. Step 6: Start the server
  7. Step 7: set up scheduled backups and WAL archiving on the new server

Step 1: verify hardware architecture and PostgreSQL version

For a successful restore, Barman requires that the hardware architecture and PostgreSQL version are both identical. You can verify these with some simple terminal commands:

rob@pg:~$ uname -m # report architecture
x86_64
rob@pg:~$ psql --version # PostgreSQL version
psql (PostgreSQL) 15.5 (Ubuntu 15.5-1.pgdg22.04+1)

Run these on both the source and target. Obviously, if you dont have the source any more (which is why you’re restoring), you’ll need to make some assumptions…

Step 2: Install Barman, barman-cli-cloud, AWS CLI, python-snappy

Follow the instructions in steps 5-10 of in my original article. When you’ve completed the restore, you can pick up from steps 11 onwards to automate backups again.

If you no longer have the credentials for your IAM user, log in to the console and generate a secondary key pair. Once you’ve completed the restore, you can swap them and retire the old keys.

Step 3: Verify S3 connectivity and identify the latest backup

We can use the barman-backup-list command to see a list of our backups on S3. This will also verify connectivity.

rob@pg:~/.aws$ barman-cloud-backup-list s3://<bucket>/barman pg
Backup ID           End Time                 Begin Wal                     Archival Status  Name                
20231023T132628     2023-10-23 13:33:25      000000010000004E00000060                                           
20231103T130531     2023-11-03 13:22:11      000000010000005200000081                                           
20231103T135700     2023-11-03 14:11:59      000000010000005200000083                                           
20231107T211340     2023-11-07 21:28:10      0000000100000052000000B1                                           
20231107T230140     2023-11-07 23:12:41      0000000100000052000000B9                                           
20231107T231341     2023-11-07 23:22:42      0000000100000052000000BB                                           
20231107T234029     2023-11-07 23:48:10      0000000100000052000000C1                                           
20231108T060002     2023-11-08 06:12:46      0000000100000052000000C5                                           
20231108T120002     2023-11-08 12:32:50      0000000100000052000000C9                                           
20231108T180002     2023-11-08 18:13:02      0000000100000052000000CD                                           
20231109T000002     2023-11-09 00:26:54      0000000100000052000000D1                                           
20231109T060002     2023-11-09 06:12:19      0000000100000052000000D5                                           
20231109T120002     2023-11-09 12:11:19      0000000100000052000000D9                                           
20231109T180002     2023-11-09 18:12:55      0000000100000052000000DD                                           
20231110T000002     2023-11-10 00:25:49      0000000100000052000000E1                                           
20231110T060001     2023-11-10 06:18:26      0000000100000052000000E5                                           
20231110T120002     2023-11-10 12:15:52      0000000100000052000000E9                                           
20231110T180002     2023-11-10 18:12:24      0000000100000052000000EE                                           
20231111T000002     2023-11-11 00:27:31      0000000100000052000000F0                                           
20231111T060002     2023-11-11 06:13:08      0000000100000052000000F2                                           
20231111T120002     2023-11-11 12:10:25      0000000100000052000000F5  

Eventually we’ll restore 20231111T120002.

Step 4: Restore the data

First, check the data_directory setting on the new server to see where to restore to. In my server setup, i move this to another drive

rob@new_server:~$ sudo -u postgres psql -c "show data_directory;"
could not change directory to "/home/rob": Permission denied
          data_directory          
----------------------------------
 /mnt/postgres/postgresql/15/main
(1 row)

Stop PostgreSQL then rename the folder:

rob@pg:~$ sudo systemctl stop postgresql
rob@pg:~$ sudo systemctl status postgresql
○ postgresql.service - PostgreSQL RDBMS
     Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Sat 2023-11-11 12:18:06 UTC; 4s ago
   Main PID: 850 (code=exited, status=0/SUCCESS)
        CPU: 3ms

Nov 10 18:11:06 pg systemd[1]: Starting PostgreSQL RDBMS...
Nov 10 18:11:06 pg systemd[1]: Finished PostgreSQL RDBMS.
Nov 11 12:18:06 pg systemd[1]: postgresql.service: Deactivated successfully.
Nov 11 12:18:06 pg systemd[1]: Stopped PostgreSQL RDBMS.
rob@pg:~$ sudo mv /mnt/postgres/postgresql/15/main /mnt/postgres/postgresql/15/main_old

We’re now ready to execute the restore procedure. We do this as the postgres user so that we avoid permission issues. Switch to them with the sudo -i -u postgres command. Then run the restore command. 20231111T120002 is the timestamp corresponding to the backup we want to restore:

postgres@new_server:~$ barman-cloud-restore --verbose s3://<bucket>/barman pg 20231111T120002 /mnt/postgres/postgresql/15/main
postgres@pg:~$ barman-cloud-restore --verbose s3://pg.cynexia.net-backup/barman pg 20231111T120002 /mnt/postgres/postgresql/15/main
2023-11-11 12:24:48,875 [218010] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2023-11-11 12:24:49,380 [218010] INFO: Found file from backup '20231111T120002' of server 'pg': barman/pg/base/20231111T120002/data.tar.snappy

Once that completes we can inspect the data folder and see it contains the right data:

postgres@new_server:~$ ls /mnt/postgres/postgresql/15/main
PG_VERSION    base    pg_commit_ts  pg_hba.conf    pg_logical    pg_notify    pg_serial     pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
backup_label  global  pg_dynshmem   pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    pg_wal       postgresql.auto.conf

We now need to copy the config files postgresql.conf, pg_hba.conf and pg_ident.conf to the PostgreSQL installation directory. First, we’ll rename the existing files, then copy the restored ones. Do this as the postgres user:

postgres@new_server:~$ mv /etc/postgresql/15/main/postgresql.conf /etc/postgresql/15/main/postgresql.conf.backup
postgres@new_server:~$ cp /mnt/postgres/postgresql/15/main/postgresql.conf /etc/postgresql/15/main/
postgres@new_server:~$ mv /etc/postgresql/15/main/pg_hba.conf /etc/postgresql/15/main/pg_hba.conf.backup
postgres@new_server:~$ cp /mnt/postgres/postgresql/15/main/pg_hba.conf /etc/postgresql/15/main/
postgres@new_server:~$ mv /etc/postgresql/15/main/pg_ident.conf /etc/postgresql/15/main/pg_ident.conf.backup
postgres@new_server:~$ cp /mnt/postgres/postgresql/15/main/pg_ident.conf /etc/postgresql/15/main/

We need to create the recovery signal to tell PostgreSQL to initialise a recovery:

touch /mnt/postgres/postgresql/15/main/recovery.signal

Step 5: Configure WAL recovery

First, we need to disable the archive_command to stop the server from overwriting the bucket contents. First, find the archive_mode setting and set it to off:

sudo sed -i '/^#archive_mode/c\archive_mode = on' /etc/postgresql/*/main/postgresql.conf
sudo sed -i 's/^archive_mode\s*=.*/archive_mode = off/' /etc/postgresql/*/main/postgresql.conf

Just to be sure, we comment out archive_command:

sudo sed -i 's/^\(archive_command\s*=.*\)/#\1/' /etc/postgresql/*/main/postgresql.conf

We then need to set the restore_command and recovery_target_time in /etc/postgresql/15/main/postgresql.conf. We’ll use the barman-cloud-wal-restore command:

sudo sed -i '/^#restore_command/c\restore_command = '"'"'barman-cloud-wal-restore s3://pg.cynexia.net-backup/barman pg %f %p'"'" /etc/postgresql/*/main/postgresql.conf
sudo sed -i '/^#recovery_target_timeline/c\recovery_target_timeline = '"'"'latest'"'" /etc/postgresql/*/main/postgresql.conf

Step 6: Start the server

start the server:

sudo systemctl start postgresql

check the logs with sudo tail -n 100 /var/log/postgresql/postgresql-15-main.log. Fix any configuration errors by referring to your backup configuration files.

If you get an error like this:
2023-11-11 17:48:15.594 UTC {[local]} [260249] postgres@template1 FATAL: database locale is incompatible with operating system
2023-11-11 17:48:15.594 UTC {[local]} [260249] postgres@template1 DETAIL: The database was initialized with LC_COLLATE “en_US.UTF-8”, which is not recognized by setlocale().
2023-11-11 17:48:15.594 UTC {[local]} [260249] postgres@template1 HINT: Recreate the database with another locale or install the missing locale.
You can fix this by installing the correct locale:
sudo locale-gen en_US.UTF-8
sudo update-locale

You can check that the restore was successful by running a query against the database, and checking that the /mnt/postgres/postgresql/15/main/recovery.signal file is gone.

Step 7: set up scheduled backups and WAL archiving on the new server

Follow steps 11-13 on my ‘set up backup’ post. This will re-enable WAL archiving to S3, and create a full backup on a schedule.

From RICE to ICE: which framework for your project?

I’ve previously explained the RICE and ICE techniques for prioritisation. Both techniques are frameworks used to evaluate and rank projects or tasks based on their potential impact, feasibility, and difficulty. However, I wanted to highlight the two key differences between them to help you chose the right tool for your project.

The ICE technique (Impact, Confidence, Ease) assigns scores to each project based on the potential impact of the project, the level of confidence in its success, and the ease of implementing it. The scores for each factor are multiplied to get a final score, which is used to rank the projects in order of priority.

The RICE technique (Reach, Impact, Confidence, Effort) takes a similar approach, but adds an additional factor: Reach. Reach refers to the number of people or customers who would benefit from the project. Each project is assigned a score out of 10 for each factor, with the scores for Reach, Impact, Confidence, and Effort multiplied to get a final score.

The main difference, therefore, between the two techniques is the inclusion of Reach which makes the technique particularly useful for marketing campaigns or projects aimed at customer acquisition i.e. where the breadth of impact is important.

Another difference is that the RICE technique places more emphasis on effort, which refers to the level of resources or time required to implement the project. This can help teams to prioritise projects that are feasible to implement given the available resources.

TechniqueFactorsCalculationPurpose
RICEReach, Impact, Confidence, Effort(Reach x Impact x Confidence) / EffortProjects with potential to reach a large audience or that require significant resources to implement
ICEImpact, Confidence, EaseImpact x Confidence x EaseSmaller projects or tasks that can be implemented more easily

I hope this helps!

Ice, Ice Baby: Chill Out and Prioritise with the ICE Technique

Yesterday, i talked about the RICE technique for prioritisation. Today, i want to introduce ICE technique, another prioritisation framework used to evaluate and prioritise tasks or projects based on three factors: Impact, Confidence, and Ease. Tomorrow, i’ll compare them both.

  • Impact refers to the potential positive outcome or benefit of completing a particular task or project, considering the potential impact of the task or project on the overall goals or objectives of the organisation or project. For example, is this going to reduce costs? Increase customer loyalty or satisfaction? Reduce developer frustration?
  • Confidence refers to the level of certainty or confidence that the task or project will be successful – factors such as available resources, expertise, and potential roadblocks or obstacles. Are we likely to be able to deliver?
  • Ease refers to the level of difficulty or complexity of completing the task or project, taking account of things like the level of effort required, the time needed, the necessary skills needed, or difficulty obtaining or using resources. Perhaps the project isn’t that hard – but we simply don’t have a developer with the right skills to implement it, or perhaps we can’t support it/keep it running over time.

To use the ICE technique, each item is assigned a score out of 10 for each factor, and the scores are then multiplied together to calculate a final score for each task or project. The higher the final score, the higher we should prioritise completing that item.

This creates a simple yet effective framework which allows us to compare the total potential impact, feasibility, and difficulty. For example, you might use this to prioritise potential new product ideas for a tech startup:

IdeaIdea DescriptionImpactConfidenceEase
1A new mobile app that helps people track their daily water intake and reminds them to stay hydrated throughout the day8 – there is a growing awareness of the importance of staying hydrated7 – the team has some experience building mobile apps but this one would require some new features8 – the basic features can be implemented quickly
2A new software tool that automates social media marketing for small businesses, allowing them to create, schedule and publish posts on multiple platforms with ease9 – social media marketing is critical for small businesses but can be time-consuming9 – the team has expertise in social media marketing and has built similar tools in the past6 – integrating with multiple social media platforms and providing advanced features will take time and resources
3A new AI-powered chatbot that can assist customers with basic support queries, reducing the load on the support team7 – many companies are looking for ways to reduce support costs and improve customer satisfaction8 – the team has some experience with chatbot development and has access to AI libraries7 – developing the chatbot and integrating it with the company’s support systems will require some time and effort)

Using the ICE technique, we would multiply the scores for each idea to get a final score:

Idea 1: 8 x 7 x 8 = 448 Idea 2: 9 x 9 x 6 = 486 Idea 3: 7 x 8 x 7 = 392

Based on these scores, we would prioritise the ideas in the following order:

  1. Idea 2 – social media marketing (486)
  2. Idea 1 – app to track daily water intake (448)
  3. Idea 3 – customer support chatbot (392)

So – our potential startup should probably focus on an app to help small businesses with their social media, then track water intake, and finally a chatbot. This doesn’t take account of the fact that there are already 10,000,000 apps for tracking water intake and i’m not sure how to make money on them, or that social media marketing is a field littered with failed apps.

You want RICE with that?

Imagine that you are a product manager at a software company, and you have three potential features to prioritise for the next development cycle. How do you pick between them? There are many ways, but one i recently learned about is the RICE model – a prioritisation framework used by product managers, teams, and organisations to prioritise projects, features, or tasks based on their potential impact, effort, and other factors. RICE stands for Reach, Impact, Confidence, and Effort, and it provides a quantitative approach to decision-making.

  1. Reach: Reach refers to the number of users, customers, or stakeholders who will be affected by the project or feature over a specific period (e.g., a month or a quarter). It is essential to estimate the reach to understand how many people will benefit from the implementation.
  2. Impact: Impact measures the potential benefit or positive effect that the project, feature, or task will have on users, customers, or stakeholders. Impact is usually measured on a scale, such as 1 (minimal impact) to 3 (significant impact), but the scale can be adjusted to suit the organization’s needs.
  3. Confidence: Confidence is an estimate of how certain the team is about the reach, impact, and effort assessments. This factor is crucial because it accounts for the inherent uncertainty in making predictions. Confidence is expressed as a percentage, typically ranging from 50% to 100%.
  4. Effort: Effort is an estimate of the amount of time, resources, or work needed to complete the project, feature, or task. Effort can be measured in person-hours, person-days, or any other metric that reflects the resources required to complete the work.

To use the RICE model, you assign values to each of the four factors (Reach, Impact, Confidence, and Effort) for every project, feature, or task under consideration. Then, calculate the RICE score using the following formula:

RICE score = (Reach * Impact * Confidence) / Effort

Projects or features with the highest RICE scores should be prioritised over those with lower scores. This method helps ensure that the team is working on the most valuable and impactful initiatives, while also taking into account the resources and level of certainty associated with each project.

For example:

Feature A: Improve the onboarding process for new users

  • Reach: 1000 users per month
  • Impact: 3 (high impact, as it can significantly improve user retention)
  • Confidence: 90% (high confidence in estimates and potential outcome)
  • Effort: 200 person-hours

Feature B: Implement a dark mode theme

  • Reach: 300 users per month
  • Impact: 2 (moderate impact, as it enhances user experience)
  • Confidence: 80% (fairly confident in the estimates)
  • Effort: 100 person-hours

Feature C: Optimise backend performance

  • Reach: 500 users per month
  • Impact: 1 (low impact, as most users won’t notice the difference)
  • Confidence: 70% (uncertain about the exact impact and effort)
  • Effort: 150 person-hours

Now calculate the RICE scores for each feature:

Feature A RICE score = (1000 * 3 * 0.9) / 200 = 13.5 Feature B RICE score = (300 * 2 * 0.8) / 100 = 4.8 Feature C RICE score = (500 * 1 * 0.7) / 150 = 2.33

Based on the RICE scores, the priority order for these features should be:

  1. Feature A: Improve the onboarding process for new users (13.5)
  2. Feature B: Implement a dark mode theme (4.8)
  3. Feature C: Optimize backend performance (2.33)

Using the RICE model, you can see that Feature A should be the top priority, as it has the highest potential impact on users with a reasonable amount of effort.

Tomorrow, i’ll explain the ICE technique.

Are you A senior developer, or THE lead developer

In our world, we organise in Pods – an autonomous group of 6-9 people with all the skills needed to solve a problem. Multiple Pods form a Team. Within a Pod, there can be multiple Senior Developers, but only a single Lead Developer. They have different and overlapping responsibilities and accountabilities.

Every project must have exactly one Lead Developer, and has one or more Senior Developers.

It is the accountability of the project or product manager to ensure that these roles exist in a team, and that the roles are filled with skilled team members able and willing to fulfil the role.

A Senior Developer

Every project must have at least one Senior Developer who has:

  • high competence in the core technologies used in the project
  • reasonable competence in all technologies used in the project
  • a willingness to learn, and takes action to generate opportunities for learning
  • an understanding and ability to explain high level architectural principles
  • the ability to generate implementable steps through the selection of appropriate patterns.

They are responsible for the following activities:

  • performing code and design reviews against industry and company best practice, defined implementation plans etc.
  • demonstrating “technical common sense” to ensure the team is producing clean, supportable, sustainable products
  • embody best practice around software engineering and software delivery, including testing, automation, deployment, etc. As an example, what technical standards are to be followed? How will we handle branching? Code reviews?
  • actively participate in creation of detailed designs
  • helps the wider organisation through activities like lunch and learns, pattern generation, training etc
  • actively mentors less experienced developers, typically spending 10-30% of their time on this alone.

You are probably a “Senior Developer” if team members keep asking you how to do things. We have the same expectations of Staff and Contractor Senior Developers, including that they spend significant time coaching and developing others.

The Lead Developer

In our projects, we expect the most senior developer to take on the role of “Lead Developer”. This role entails more leadership activities – the Lead Developer is accountable for:

  • Being a “Senior Developer” i.e. the Lead Developer also does all of the things that a Senior Developer does.
  • Ensuring that the Developers and Senior Developers are fulfilling their responsibilities.
  • The creation of the detailed technical designs necessary for implementation
  • Work closely with architecture to ensure continuity and coherence between detailed technical designs and high level solution and reference architecture
  • assist with planning and scoping of work, including helping design the delivery team
  • Assist with interviewing
  • meet with senior management (IT or commercial) to ensure proper understanding of the project, delivery, etc. on both sides
  • Lead the development team and create clarity of vision, design and expectations

The Lead Developer role requires the person to spend less time actually writing code – in some weeks, you might spend 30% or less of your time actually writing code, depending on the stage of the work.