The effects of batch size and linger time on Kafka throughput

By default, Kafka attempts to send records as soon as possible, sending up to max.in.flight.requests.per.connection messages per connection. If you attempt to send more than this, the producer will start batching messages, but ultimately if you saturate the connection so there are unacknowledged message batches pending, the producer enters blocking mode.

We can optimise the way the producer batches messages to achieve higher throughput through two settings, batch.size and linger.ms.

linger.ms is the number of milliseconds that the producer waits for more messages before sending a batch. By default, this value is 0, meaning that the producer attempts to send messages immediately. Smaller batches have higher overhead – they are less compressible, and there’s an overhead associated with processing and acknowledging them. Under moderate load, messages may not come frequent enough to fill a batch immediately, but by introducing a small delay (e.g. 50ms), we can increase the likelihood that the producer can batch more messages together, improving throughput.

Kafka will always send a batch once it’s full. The other configuration we can consider is batch.size. As the name suggests, this controls the size of a batch (per partition) – and so by increasing batch size, we make it possible to send more messages at once.

We did some testing on this, and discovered that the performance improvement basically optimises around 75-100ms for our message size and volume:

chart showing steep performance improvements for 3 different batch sizes, levelling off at around 100ms

Summary

  • Increasing the producer’s ability to batch – by raising linger.ms (e.g. we found ~75–100 ms worked best for our workload) and increasing batch.size – significantly improved throughput because it lets Kafka send larger, more-compressible batches and amortise per-request overhead.
  • But this is a trade-off – you add up to linger.ms of extra latency for the first message in a batch, and very large batches increase memory pressure and recovery cost on retries.
  • Testing different values will allow you to optimise throughput for your own workload.

Using Claude Code for things that aren’t code – product comparisons

Claude Code is pretty amazing. It’s let me build prototypes and improve apps faster than I ever thought possible. But I was wondering – what else can it do?

I’ve been using Emby for years, but recently I started to wonder what happened to Jellyfin, the project that forked Emby years ago. So i decided to ask Claude Code to help me by setting up a debate and mediating the answer.

Setting up the debate

Here’s the prompt i used:

Today's date is 1st October 2025. I want to compare emby and jellyfin.  create two agents, one each, and require them to make the case to each other about why their product is better. make sure their info is up to date by e.g. letting them searching the web. Give them two or three attempts to make their case. They talk to each other via a shared, written .md file. At the end, you summarise the discussion and make a recommendation. Keep all the working files. Let them use subfolders of this folder to create their own private note.

pretty basic stuff. It got to work, setting up a task list:

☐ Set up project structure (create folders and initial debate file)
☐ Launch Emby advocate agent for initial research and first argument
☐ Launch Jellyfin advocate agent for initial research and first argument
☐ Facilitate round 2 of debate
☐ Facilitate round 3 of debate
☐ Summarize debate and provide recommendation

The folder structure was quite basic:

❯ tree
.
├── debate.md
├── emby-notes
│   └── research.md
└── jellyfin-notes
    └── research.md

3 directories, 3 files

debate.md has the following structure:

debate template
# Emby vs Jellyfin Debate
**Date:** October 1st, 2025

## Debate Format
Each advocate will present their case in rounds, responding to their opponent's arguments and highlighting their platform's strengths.

---

## Round 1: Opening Arguments
### Emby Advocate's Opening Statement
*[To be added during first round]*

### Jellyfin Advocate's Opening Statement
*[To be added during first round]*

---
## Round 2: Rebuttals and Deeper Analysis

### Emby's Rebuttal

*[To be added during second round]*

### Jellyfin's Rebuttal
*[To be added during second round]*

---
## Round 3: Final Arguments
### Jellyfin's Closing Statement
*[To be added during final round]*

### Emby's Closing Statement
*[To be added during final round]*

---
## Summary and Recommendation
*[To be added after all rounds complete]*

Three rounds down

It then fired up the agents. Each wrote a long (150-200 line) summary of its findings. The first round of these were pretty basic stuff – project backgrounds, features etc., and then they wrote their arguments in the document, and there were some dingers in there:

  • “When Jellyfin advocates celebrate reviving clients “dormant for 5 years,” Emby users simply enjoy apps that never went dormant because sustainable funding ensures continuous maintenance.”
  • “Let’s address the elephant in the room: hardware transcoding performance” – in subsequent rounds the Emby bot went on to list hardware acceleration bugs from Jellyfin’s GitHub issues while talking about its own “streamlined GPU acceleration that simply works”.
  • “version 10.11.0 is on track for release at the end of April 2025” – in round 2 the Emby bot noted “it’s now October 2025, and Jellyfin is still stuck in release candidate phase … following a critical library-breaking bug”

The debate went on for 3 rounds – about 30 minutes – with 25 web searches, producing around 4,000 words. At the end, the moderator produced a summary:

This debate revealed fundamental philosophical differences between two capable media server solutions, each with distinct strengths and trade-offs.

Emby’s Core Arguments:

  • Professional development model ensures consistent updates and accountability ($119 lifetime, half of Plex’s $249.99)
  • Superior hardware transcoding that “just works” out of the box (20+ simultaneous streams)
  • Native platform support across devices without complex configuration
  • Sustainable funding model that guarantees continuous maintenance
  • Time value proposition: paying modest fee saves hundreds of hours of troubleshooting

Jellyfin’s Core Arguments:

  • Completely free and open-source with no artificial feature restrictions
  • Strong community support (ranked #1 on Slant vs Emby at #5)
  • Complete privacy and transparency with no tracking or telemetry
  • Rapid development with significant performance improvements (100x in some areas)
  • Philosophical commitment to software as a public good

Key Contested Points:

  1. Reliability: Emby highlighted Jellyfin’s 6-month delayed 10.11.0 release with ongoing RC issues. Jellyfin countered that RC testing protects production users while Emby shipped breaking updates to paying customers.
  2. Hardware Transcoding: Emby emphasized simplicity while Jellyfin stressed capability and transparency of documentation.
  3. Sustainability: Emby argued professional funding ensures consistency. Jellyfin claimed community support was so strong they halted donations.
  4. Support: Emby offers professional support channels. Jellyfin relies on community forums and volunteers.

And the most copout recommendation possible:

After analyzing the debate, the choice depends entirely on your priorities and technical expertise.

It explained its rationale, before concluding:

Final thought: Start with Jellyfin. If you encounter insurmountable issues or find the setup too complex, Emby’s $119 lifetime license remains available as a fallback option.

I mean – what is that!?!

Analysing the debate

Claude Code produces jsonl files for each agent invocation. These contain the line-by-line record of each model invocation. They’re stored in the ~/.claude/projects folder:

❯ ls -alh ~/.claude/projects/-Users-rob-Downloads-jellyfin-vs-emby
total 2056
drwx------@  3 rob  staff    96B  1 Oct 22:56 .
drwxr-xr-x  13 rob  staff   416B  1 Oct 21:42 ..
-rw-r--r--@  1 rob  staff   991K  1 Oct 21:40 91fccf65-7baf-444f-8a3f-5a04251d2329.jsonl

So, i did what any curious mind would do – i asked Claude Code to analyse them and pull out any interesting facts. It produced a bunch of charts and diagrams – word counts and the like.

It produced a bunch of analysis, such as this “evidence score calculation”:

  The evidence score (0-1 scale) is calculated based on:

  • Statistical evidence (20% per instance): Numbers, percentages, dollar amounts
  • Specific claims (15% per instance): Version numbers, release dates
  • Technical details (10% per instance): GPU, CPU, codecs, transcoding mentions
  •  Base content score (30%): Added if content > 500 characters

I’m not sure i agreed with the basis – surely it should have validated the claims – but it basically showed that they both did poorly in round 1, maximised in round 2, and remained high throughout round 3. Fascinating.

It then performed a topic distribution analysis:

  • Cost/Price: $, price, cost, free, subscription, lifetime, pay, fee, expensive, affordable, 119, 249
  • Performance: transcode, hardware, GPU, CPU, stream, performance, speed, 4K, HDR, HEVC, codec, acceleration
  • Philosophy: freedom, open, source, community, proprietary, transparent, privacy, control, ideology, lock-in
  • Support: support, help, documentation, fix, bug, issue, troubleshoot, forum, professional, volunteer
  • Features: feature, capability, function, update, version, release, client, platform, compatibility, device

with the following summary:

  1. Stark Philosophical Divide:
    • Jellyfin: 40% of mentions are philosophy-related (freedom, open-source, community)
    • Emby: Only 14.7% philosophy, focuses on practical matters
  2. Technical vs Ideological:
    • Emby’s top focus: Performance (27.3%) and Support (25.9%)
    • Jellyfin’s top focus: Philosophy (40.0%) dominates everything else
  3. Balanced on Cost:
    • Both agents mention cost/price equally (23 mentions each)
    • Shows this was a key battleground topic
  4. Support Strategy Difference:
    • Emby emphasizes professional support (37 mentions, 25.9%)
    • Jellyfin downplays support (19 mentions, 13.1%)
A grouped bar chart comparing topic focus between Emby (orange bars) and Jellyfin (blue bars) agents across 5 categories. The chart shows keyword mention counts from a debate transcript.
Data points:
- Cost/Price: Emby 23 mentions (16.1%), Jellyfin 23 mentions (15.9%) - Equal focus
- Performance: Emby 39 mentions (27.3%), Jellyfin 25 mentions (17.2%) - Emby emphasizes more
- Philosophy: Emby 21 mentions (14.7%), Jellyfin 58 mentions (40.0%) - Jellyfin's dominant focus
- Support: Emby 37 mentions (25.9%), Jellyfin 19 mentions (13.1%) - Emby emphasizes more
- Features: Emby 23 mentions (16.1%), Jellyfin 20 mentions (13.8%) - Roughly equal

I’ve uploaded the entire debate to GitHub, including the transcript and this human readable version. I didnt bother adding the 5,000 lines of autogenerated scripts which produced these charts. Further analysis of the transcript is left as an exercise for the reader.

What’s next?

This was just a bit of fun, but I did it to test out how useful LLMs can be in helping make technical or product decisions – and the answer is – not really, or at least not just yet, and not with a trivial prompt.

Yes, they can compile a succinct list of the differentiating talking points better than most humans, but ultimately (at least in this case) they waffle on and end with a diplomatic but unhelpful non-answer.

This sort of approach could help with lots of complex decisions – choosing a database, picking a framework in a new project, narrowing down a list of restaurant recommendations. In all of these cases, the process benefits from a strong reward function – what criteria are most important to us in this decision?

Perhaps i should spend some time tweaking the prompt to force a more meaningful end result. Until then, I’ll just base all my decisions “entirely on my priorities and technical expertise“.

Fix: `Error: Cannot find module @rollup/rollup-linux-x64-gnu. npm has a bug related to optional dependencies` in CI

Although builds were succeeding locally, during CI builds were failing on Linux with the error Error: Cannot find module @rollup/rollup-linux-x64-gnu. npm has a bug related to optional dependencies. The log suggested removing package-lock.json – but this obviously breaks deterministic builds.

The root cause is a missing dependency – specifically, when Rollup needs native binaries for performance optimisation. Mac-generated lockfiles won’t include Linux-specific optional dependencies. First noted in npm bug 4828, it was fixed in npm 11.3.0+, but adding an explicit optional dependency prevents edge cases.

The fix is:

  • add @rollup/rollup-linux-x64-gnu: "*" to optionalDependencies in package.json
  • Update package-lock.json to include the platform-specific dependency

This allows builds to work reliably across platforms.

Protocol level integrity guarantees in Kafka

I was recently asked to design a method to meet ITAC (IT Application Controls) standards for critical data flows in our organisation. ITAC are application-level controls looking mainly at how we ensure the completeness, accuracy and validity of transactions – for example, invoices or trades. Our control set is based on the ICFR principles, of which ITAC is one part.

The specific control objective I was asked to look at relates to the risk of loss of integrity of financial data transfers. The focus on the integrity of the data, not authenticity or non-repudiation is really important as it means cryptographic solutions aren’t required – Kafka’s native protocol features can satisfy the requirements.

Traditionally, we would compute a client side checksum over the key elements of the payload (e.g. if this is an invoice, the customer, invoice total, number of line items etc.) and send that with the message. This could then be recomputed on the receiving end regardless of the format transformation which happens in the middleware (e.g. converting JSON to XML to feed to a legacy system).

Why this is hard to apply in Kafka

These ITAC standards were developed around “classic” middleware like BizTalk, which operates fundamentally differently than Kafka. BizTalk functions as an ETL (Extract, Transform, Load) process where data transformation and potential loss are expected parts of the workflow. Kafka, however, operates more like a database log – it’s an append-only, read-only system once messages are written.

This architectural difference is significant. Most integrity risks arise from the ETL nature of traditional middleware, but this risk profile doesn’t apply to Kafka’s immutable message model.

How Kafka guarantees integrity

ITAC breaks down integrity guarantees into two core checks:

1. Completeness Controls

Standard Requirement: Implement reconciliation-based checks and interface failure monitoring.

Kafka’s Native Solution: Kafka’s “at least once” delivery guarantee is built directly into the protocol. When both producers and consumers are properly configured (which they are by default), this eliminates the need for additional completeness controls as we know that messages will be represented to the client until the client acknowledges them.

For monitoring i.e. can we tell whether an application is consistently failing, OpenTelemetry (OTEL) provides sufficient coverage, though we may need to fine-tune client OTEL configurations. Some clients process massive data volumes in a very short period of time which can cause them to drop telemetry based on available memory and throughput capacity, creating random gaps in the telemetry. Implementing sampling to prioritise delivery of error events would address this concern.

2. Accuracy Controls

Traditional Challenge: There are a number of accuracy risks stemming from ETL middleware’s data transformation capabilities. For example, how do we know the transformation logic is correct, or executed with ACIDic guarantees?

Kafka’s Advantage: Messages cannot be altered once sent, and end-to-end accuracy is already embedded in the Kafka protocol through a three-layer approach:

Producer Side:

  • Generates batches with RecordHeader metadata
  • Includes record count and batch-level CRC (checksum) calculated by the producer
  • Ensures data integrity from the source

Broker Side:

  • Multiple mechanisms record and test batch CRC with automatic index and log rebuilding on integrity failures
  • Validation rule: “A message entry is valid if the sum of its size and offset are less than the length of the file AND the CRC32 of the message payload matches the CRC stored with the message”, leading to automatic log truncation to the last valid offset when corruption is detected.
  • The broker serves entire batches to consumers (no selective message retrieval) – therefore as long as the batch CRC matches the CRC from the producer, we can be very sure that the content is correct.

Consumer Side:

The Bottom Line

Kafka’s architecture inherently satisfies ITAC integrity requirements without additional controls. The protocol’s built-in checksums, delivery guarantees, and immutable message design provide the completeness and accuracy controls that ITAC standards require.

When using Kafka, we should focus on making use of the application semantics offered by Kafka – such as eventual consistency and an event-driven architecture – rather than spending our time implementing additional integrity layers – the controls are already there, working at the protocol level.

How to: use custom / self-signed certificates with Requests in Python

In a previous post, I explained how to configure Azurite to use a self-signed certificate to enable OAuth authentication. One challenge with this method is that the Azure Python SDK will refuse to connect to azurite, reporting errors such as:

azure.core.exceptions.ServiceRequestError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)

This is because the the Azure SDK uses the Requests library, which in turn uses Certifi as its source of root certificates. Certifi provides a regularly updated bundle of the Mozilla root trust store, and our self-signed custom certificate obviously isn’t in Mozilla’s trust list!

Making Requests trust our self-signed certificate

You can get around this by setting the REQUESTS_CA_BUNDLE or CA_BUNDLE environment variables which tells Certifi to ignore its inbuilt certificate bundles, but if we just point that at our self-signed root certificate then we lose access to the rest of the trust store, which makes connecting to public APIs difficult.

One option is to copy the certifi trust store to a location in our workspace and inject our certificate, then set the environment variables to point to this new store, but that’s more config to manage, and this is just local development – we dont need the extra hassle.

Injecting our certificate in to Certifi’s trust store

So my preferred method is to inject my self-signed certificate in to Certifi’s trust store directly, and I created the following bash script to do this. First it checks you’re in a virtual environment, then checks that certifi is installed (it uses pip for this, but this will work alongside other package managers). Finally, it extracts the first line of the base64 encoded certificate and checks to make sure it isn’t already in the certificate bundle before injecting it along with some metadata to help you figure out where it came from.

#!/bin/bash
# Function to log errors and exit
log_error() {
    echo "❌ Error: $1"
    exit 1
}
# Check if custom certificate path is provided
CUSTOM_CERT="$1"
if [ -z "$CUSTOM_CERT" ]; then
    log_error "Please provide the path to the custom certificate as an argument."
fi
# Check if the custom certificate file exists
if [ ! -f "$CUSTOM_CERT" ]; then
    log_error "Custom certificate file not found at $CUSTOM_CERT."
fi
# Check if in a virtual environment
if python -c 'import sys; sys.exit(0 if sys.prefix != sys.base_prefix else 1)'; then
    echo "✅ Virtual environment detected: $(python -c 'import sys; sys.stdout.write(sys.prefix)')"
else
    log_error "Not in a virtual environment. Please activate a virtual environment before running the script."
fi
# Check if certifi is installed
if ! python3 -m pip show certifi >/dev/null 2>&1; then
    log_error "certifi is not installed. Please install it using 'python3 -m pip install certifi'."
fi
# Get the path to the certifi certificate file
CERTIFI_CERT=$(python3 -m certifi) || log_error "Failed to get certifi certificate path."
if [ ! -f "$CERTIFI_CERT" ]; then
    log_error "Certifi certificate file not found at $CERTIFI_CERT."
fi
# Extract the first line of the public key from the custom certificate as the unique identifier
UNIQUE_IDENTIFIER=$(awk '/BEGIN CERTIFICATE/{getline; print}' "$CUSTOM_CERT")
echo "Custom certificate: $CUSTOM_CERT"
echo "Certifi certificate file: $CERTIFI_CERT"
echo "Unique identifier: $UNIQUE_IDENTIFIER"
# Check if the custom certificate is already in certifi's certificate file, and append if not
if grep -q "$UNIQUE_IDENTIFIER" "$CERTIFI_CERT"; then
    echo "✅ Custom certificate is already present in certifi's certificate file."
else
    echo "Appending custom certificate..."
    # Extract additional certificate information
    ISSUER=$(openssl x509 -in "$CUSTOM_CERT" -noout -issuer | sed 's/issuer=//')
    SUBJECT=$(openssl x509 -in "$CUSTOM_CERT" -noout -subject | sed 's/subject=//')
    LABEL=$(openssl x509 -in "$CUSTOM_CERT" -noout -subject | awk -F 'CN=' '{print $2}' | cut -d, -f1)
    SERIAL=$(openssl x509 -in "$CUSTOM_CERT" -noout -serial | sed 's/serial=//')
    MD5_FINGERPRINT=$(openssl x509 -in "$CUSTOM_CERT" -noout -fingerprint -md5 | sed 's/MD5 Fingerprint=//')
    SHA1_FINGERPRINT=$(openssl x509 -in "$CUSTOM_CERT" -noout -fingerprint -sha1 | sed 's/SHA1 Fingerprint=//')
    SHA256_FINGERPRINT=$(openssl x509 -in "$CUSTOM_CERT" -noout -fingerprint -sha256 | sed 's/SHA256 Fingerprint=//')
    TIMESTAMP=$(date -u +"%Y-%m-%d %H:%M:%S UTC")
    # Append the certificate with metadata
    {
        echo
        echo "# Certificate added by script at $TIMESTAMP"
        echo "# Issuer: $ISSUER"
        echo "# Subject: $SUBJECT"
        echo "# Label: \"$LABEL\""
        echo "# Serial: $SERIAL"
        echo "# MD5 Fingerprint: $MD5_FINGERPRINT"
        echo "# SHA1 Fingerprint: $SHA1_FINGERPRINT"
        echo "# SHA256 Fingerprint: $SHA256_FINGERPRINT"
        cat "$CUSTOM_CERT"
    } >> "$CERTIFI_CERT" || log_error "Failed to append custom certificate to certifi."
    echo "✅ Custom certificate appended to certifi's certificate file."
fi

Drawback – certifi is updated

This approach (or either approach really) has one major drawback – it needs to be run every time Certifi is updated, which is roughly monthly. So, i decided to run the script as a launch task in VSCode so that I can attach it to the “debug” tasks that are already embedded in my workflow. While i was at it, i chained a poetry install task to ensure that my dependencies are fully installed. I added the task in VSCode’s tasks.json to run the script after installing the packages (also note i’ve replaced pip with poetry in the package task):

{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "func host start",
      "type": "func",
      "command": "host start --verbose --script-root ${workspaceFolder}/app",
      "options": {
        "env": {
          "PYTHON_ENABLE_DEBUG_LOGGING": "1"
        }
      },
      "problemMatcher": "$func-python-watch",
      "isBackground": true,
      "dependsOn": "add minica cert to certifi"
    },
    {
      "label": "add minica cert to certifi",
      "type": "shell",
      "osx": {
        "command": "${workspaceFolder}/azurite/add_minica_cert_to_certifi.sh ${workspaceFolder}/azurite/minica.pem"
      },
      "linux": {
        "command": "${workspaceFolder}/azurite/add_minica_cert_to_certifi.sh ${workspaceFolder}/azurite/minica.pem"
      },
      "problemMatcher": [],
      "dependsOn": "pip install (functions)"
    },
    {
      "label": "pip install (functions)",
      "type": "shell",
      "osx": {
        "command": "poetry install --with dev --no-interaction --sync --no-root"
      },
      "linux": {
        "command": "poetry install --with dev --no-interaction --sync --no-root"
      },
      "problemMatcher": []
    },
  ]
}

Finally, i added it as a preLaunchTask in launch.json so that it runs when i manually debug too:

{
  // Use IntelliSense to learn about possible attributes.
  // Hover to view descriptions of existing attributes.
  // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python: Debug Tests",
      "type": "debugpy",
      "request": "launch",
      "program": "${file}",
      "purpose": [
        "debug-test"
      ],
      "console": "integratedTerminal",
      "justMyCode": true,
      "stopOnEntry": false,      
      "env": {
        "PYTEST_ADDOPTS": "--no-cov",
        "PYTHONPATH": "${workspaceFolder}",
        "LOG_LEVEL": "DEBUG"
      },
      "preLaunchTask": "add minica cert to certifi"
    },
    {
      "name": "Python Debugger: Current File",
      "type": "debugpy",
      "request": "launch",
      "program": "${file}",
      "console": "integratedTerminal",
      "preLaunchTask": "add minica cert to certifi"
    },
    {
      "name": "Attach to Python Functions",
      "type": "debugpy",
      "request": "attach",
      "connect": {
        "host": "localhost",
        "port": 9091
      },
      "preLaunchTask": "func host start"
    }
  ]
}

Conclusion

By injecting our self-signed certificate in to Certifi’s trust store, we’ve created a solution that:

  1. Maintains the integrity of the existing trust store
  2. Avoids the need for environment variable management
  3. Integrates seamlessly with our development workflow in VSCode

Although this approach requires us to run the script whenever Certifi is updated, by using a VSCode task configuration we can ensure that this happens automatically, and link it to our existing debug tasks.

Remember, this method is intended for development purposes. In production environments, always use properly signed certificates from trusted certificate authorities.

How to: use Azurite with self-generated certificates for HTTPS in a Codespace or Devcontainer

I’ve been using Azurite to simulate Azure storage for my development. If you’re not familiar with it, Azurite is a local storage emulator for Azure Storage, and you can read my other post about how i’ve set up my devcontainer configuration to run Azurite as a service container. As my deployed code is using an Azure Managed Identity, I wanted ensure my development environment was consistent with this and also uses Azure DefaultAzureCredential credential provider class. In this post, i will talk through the steps required to switch from using a connection string (with a well-known account and key) to using OAuth and HTTPS, helping to increase feature parity between development and production, reducing the chances of mistakes.

There are essentially 5 steps:

  1. Create a local Certificate Authority
  2. Configure Azurite to use the certificate (and enable OAuth with basic checking)
  3. Configure the devcontainer to trust the certificates
  4. Configure the local credential for OAuth
  5. Configure Azure Storage Explorer to trust the minica root certificate

Obviously, this is only useful for development, and you shouldn’t use this to secure services running directly on the internet.

Create a local Certificate Authority

The first hurdle is to set up a CA, and issue a certificate for Azurite to use. By far the simplest way is to use minica – a simple CA which generates a root certificate and any number of other certificates. The other tool i found is mkcert but i didn’t try it.

We could set this up so that it’s built every time we rebuild the devcontainer, but the minica certificate is valid for over 2 years, so it’s probably not worth it, so instead just install minica on your local and generate the certificates which we can then copy to the repo. There are installation instructions on the minica repo – I did this on my Mac:

~/Downloads > brew install minica
==> Downloading https://ghcr.io/v2/homebrew/core/minica/manifests/1.1.0
Already downloaded: /Users/rob/Library/Caches/Homebrew/downloads/291ff83573a0a9e0a7033accd18d58fcf701211c2b7b63a31e49c36fabc0cb5f--minica-1.1.0.bottle_manifest.json
==> Fetching minica
==> Downloading https://ghcr.io/v2/homebrew/core/minica/blobs/sha256:dc8955ffd5c34b8eaedbc556e71188ec55c2a01e76c26f853aeb0038c7ac2426
############################################################################################################################################################################################################### 100.0%
==> Pouring minica--1.1.0.arm64_sonoma.bottle.tar.gz
🍺  /opt/homebrew/Cellar/minica/1.1.0: 6 files, 4.3MB
==> Running `brew cleanup minica`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
~/Downloads > mkdir azurite-certs
~/Downloads > cd azurite-certs
~/Downloads/azurite-certs > minica -ip-addresses 127.0.0.1
~/Downloads/azurite-certs > tree
.
├── 127.0.0.1
│   ├── cert.pem
│   └── key.pem
├── minica-key.pem
└── minica.pem
2 directories, 4 files

The minica-key.pem and minica.pem files are the CA’s private and public keys respectively. The 127.0.0.1 folder contains the private key and certificate for the hostname 127.0.0.1.

Be sure to use the argument -ip-addresses and not -domains – Node requires that IP addresses are present in the Subject Alternative Name (SAN) field of certificates. If you accidentially use the -domains option, you’ll get an ERR_TLS_CERT_ALTNAME_INVALID error when you try to connect from Azure Storage Explorer.

Examining the certificate, it looks like this:

~/Downloads/azurite-certs/127.0.0.1 > openssl x509 -in cert.pem -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 4677514164283179045 (0x40e9de5991f50025)
        Signature Algorithm: ecdsa-with-SHA384
        Issuer: CN=minica root ca 190f9e
        Validity
            Not Before: Oct  6 10:03:35 2024 GMT
            Not After : Nov  5 11:03:35 2026 GMT
        Subject: CN=127.0.0.1
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (384 bit)
                pub:
                    04:...:eb
                ASN1 OID: secp384r1
                NIST CURVE: P-384
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Authority Key Identifier: 
                F1:D0:94:63:AA:37:F6:EF:CF:5F:CD:83:80:2C:95:D0:76:6C:2A:07
            X509v3 Subject Alternative Name: 
                IP Address:127.0.0.1
    Signature Algorithm: ecdsa-with-SHA384
    Signature Value:
        30:...:50

I copied the entire azurite-certs folder to the .devcontainer folder of my project, and renamed the folder 127.0.0.1 to certs (as i found i couldn’t mount the originally named folder into the container).

Configure Azurite to use the certificate (and enable OAuth with basic checking)

This is relatively easy. Azurite only really needs to be told the path to the certificate and private key from the 127.0.0.1 folder. To do this, we can mount the folder in to the container, and pass the path to Azurite in the command:

services:
  devcontainer:
    ...
  azurite:
    image: mcr.microsoft.com/azure-storage/azurite
    ports:
      - "127.0.0.1:10000:10000"
      - "127.0.0.1:10001:10001"
      - "127.0.0.1:10002:10002"
    command: >
      azurite
      --blobHost 0.0.0.0
      --queueHost 0.0.0.0
      --tableHost 0.0.0.0
      --cert /workspace/certs/cert.pem
      --key /workspace/certs/key.pem
      --oauth basic
    volumes:
      - ./azurite-certs/certs:/workspace/certs

Configure the devcontainer to trust the certificates

This step is more complicated. To enable trust, you need to install the minica root certificate in to the relevant trust stores inside the container. Thankfully, i found this script which does the trick. To use it, we’ll create our own Dockerfile which defines the devcontainer. We’ll base it on the existing image, and add a couple of steps

FROM mcr.microsoft.com/devcontainers/python:1-3.11-bullseye
# Switch to root user to install packages and update certificates
USER root
# Install ca-certificates package and libnss3-tools
RUN apt-get update && apt-get install -y ca-certificates libnss3-tools
# Copy the minica certificate to the container and install it
COPY ./azurite-certs/minica.pem /usr/local/share/ca-certificates/minica.crt
COPY ./azurite-certs/trust_minica.sh /usr/local/bin/trust_minica.sh
# Update CA certificates
RUN chmod +x /usr/local/bin/trust_minica.sh
RUN /usr/local/bin/trust_minica.sh
RUN update-ca-certificates
# Switch back to the non-root user (devcontainer default user)
USER vscode
# Keep the container running
CMD ["/bin/sh", "-c", "while sleep 1000; do :; done"]

And we’ll put the script (not my work – from this Gist) in the azurite-certs folder as trust_minica.sh:

#!/bin/sh
### Script installs minica.pem to certificate trust store of applications using NSS
### https://gist.github.com/mwidmann/115c2a7059dcce300b61f625d887e5dc
### (e.g. Firefox, Thunderbird, Chromium)
### Mozilla uses cert8, Chromium and Chrome use cert9
###
### Requirement: apt install libnss3-tools
###
###
### CA file to install (customize!)
### Retrieve Certname: openssl x509 -noout -subject -in minica.pem
###
certfile="minica.pem"
certname="minica root ca"
###
### For cert8 (legacy - DBM)
###
for certDB in $(find ~/ -name "cert8.db"); do
    certdir=$(dirname ${certDB})
    certutil -A -n "${certname}" -t "TCu,Cu,Tu" -i ${certfile} -d dbm:${certdir}
done
###
### For cert9 (SQL)
###
for certDB in $(find ~/ -name "cert9.db"); do
    certdir=$(dirname ${certDB})
    certutil -A -n "${certname}" -t "TCu,Cu,Tu" -i ${certfile} -d sql:${certdir}
done

Now we need to update our compose file to point to this new Dockerfile and use that instead of the base image. Our docker-compose.yml now looks like this:

services:
  devcontainer:
    build:
      context: .
      dockerfile: Dockerfile
    platform: linux/amd64
    volumes:
      - ..:/workspace:delegated
    ports:
      - "5000:5000"
    environment:
      - POETRY_VIRTUALENVS_IN_PROJECT=true
    command: /bin/sh -c "while sleep 1000; do :; done"
    network_mode: "host"
  azurite:
    image: mcr.microsoft.com/azure-storage/azurite
    ports:
      - "127.0.0.1:10000:10000"
      - "127.0.0.1:10001:10001"
      - "127.0.0.1:10002:10002"
    command: >
      azurite
      --blobHost 0.0.0.0
      --queueHost 0.0.0.0
      --tableHost 0.0.0.0
      --cert /workspace/certs/cert.pem
      --key /workspace/certs/key.pem
      --oauth basic
    volumes:
      - ./azurite-certs/certs:/workspace/certs

Remember to rebuild your devcontainer after making these changes

Configure the local credential for OAuth

You should now be able to run your code and you won’t receive any SSL certificate errors. But the credential provider will most likely complain that it could not find a credential:

[2024-08-30T10:23:59.038Z] DefaultAzureCredential failed to retrieve a token from the included credentials.
[2024-08-30T10:23:59.039Z] Attempted credentials:
[2024-08-30T10:23:59.039Z]      EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
[2024-08-30T10:23:59.039Z] Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue.
[2024-08-30T10:23:59.039Z]      ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable. The requested identity has not been assigned to this resource. Error: Unexpected response "{'error': 'invalid_request', 'error_description': 'Identity not found'}"
[2024-08-30T10:23:59.039Z]      SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
[2024-08-30T10:23:59.039Z]      AzureCliCredential: Please run 'az login' to set up an account
[2024-08-30T10:23:59.039Z]      AzurePowerShellCredential: PowerShell is not installed
[2024-08-30T10:23:59.039Z]      AzureDeveloperCliCredential: Azure Developer CLI could not be found. Please visit https://aka.ms/azure-dev for installation instructions and then,once installed, authenticate to your Azure account using 'azd auth login'.
[2024-08-30T10:23:59.039Z] To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.

The solution for this is relatively simple. Azurite only performs basic validation on the presented token – checking for expiry and structure, but does not validate the permissions associated with the token. So we can simply log in with the Azure CLI (az login) to ensure that a principal is available.

Configure Azure Storage Explorer to trust the minica root certificate

Azure Storage Explorer also needs to be configured to trust the new root CA. To do this, click Edit > SSL Certificates > Import Certificates and import the minica.pem file:

Next, reestablish your connection with Azurite and check the Use HTTPS box:

You can access this folder at ~/Library/Application Support/StorageExplorer/certs/ on a Mac. Restart Azurite and you’re good to go!

How to: Use Azurite as a service container in Codespaces or Devcontainers

When developing with Azure Storage, it can significantly speed up your development process if you can use a local development environment, rather than constantly connecting directly to storage in Azure itself. This is where Azurite comes in – Azurite is a local storage emulator for Azure Storage, mimicing blob/container, queue and table storage. While there are lots of ways to get it running (e.g. from binary, manually using Docker etc.), I wanted to set it up as a service container in my devcontainer configuration which provides a few benefits:

  • consistency – every time i rebuild my devcontainer i know i’m resetting Azurite back to a known, clean state
  • Isolation – running it in a separate container means i avoid any potential side effects which might arise if it’s running in my main development container
  • Portability – it works consistently on Github codespaces, local devcontainer setups etc.

This guide provides a basic outline for setting up Azurite as a service container in a devcontainer configuration.

Create docker-compose.yml to define the services

Using a docker compose file inside your devcontainer definition lets you define multiple ‘services’ which all work together. In my case, i’ve set up the devcontainer service which is the main development environment based on the Microsoft Python devcontainer image, and azurite which contains azurite. Of course you could add whatever you need – PostgreSQL, Mongo, whatever you need. Here’s my docker-compose file:

services:
  devcontainer:
    image: mcr.microsoft.com/devcontainers/python:1-3.11-bullseye
    platform: linux/amd64
    volumes:
      - ..:/workspace:delegated
    ports:
      - "5000:5000"
    environment:
      - POETRY_VIRTUALENVS_IN_PROJECT=true
    command: /bin/sh -c "while sleep 1000; do :; done"
    network_mode: "host"
  azurite:
    image: mcr.microsoft.com/azure-storage/azurite
    ports:
      - "127.0.0.1:10000:10000"
      - "127.0.0.1:10001:10001"
      - "127.0.0.1:10002:10002"
    command: >
      azurite
      --blobHost 0.0.0.0
      --queueHost 0.0.0.0
      --tableHost 0.0.0.0

Create devcontainer.json

After defining your services, you need to create a devcontainer.json file. This configures the development environment, and its in the dockerComposeFile attribute that you point to your docker-compose file, and defining the service which represents the actual devcontainer. I’ve added a bunch of features which . In my configuration, i’ve also disabled some extensions which are recommended by the base image or other feature extensions by prefixing their names with -. Finally, i included a postCreateCommand which marks the /workspace folder as safe, and installs dependencies using Poetry, although you may want to skip this last step.

{
	"name": "Python 3",
	"dockerComposeFile": ["docker-compose.yml"],
	"workspaceFolder": "/workspace",
	"service": "devcontainer",
	"features": {
		"ghcr.io/devcontainers-contrib/features/poetry:2": {"version": "1.8.3"},
		"ghcr.io/devcontainers/features/github-cli:1": {},
		"ghcr.io/devcontainers/features/node:1": {},
		"ghcr.io/devcontainers/features/azure-cli:1": {},
		"ghcr.io/flexwie/devcontainer-features/pulumi:1": {},
		"ghcr.io/prulloac/devcontainer-features/pre-commit:1": {},
		"ghcr.io/jlaundry/devcontainer-features/azure-functions-core-tools:1": {},
		"ghcr.io/devcontainers/features/rust:1": {}, // for cryptography package
		"ghcr.io/devcontainers/features/docker-outside-of-docker": {}
	},
	"customizations": {
		"vscode": {
			"extensions": [
				"-vadimcn.vscode-lldb",
				"-rust-lang.rust-analyzer",
				"-tamasfe.even-better-toml",
				"-dbaeumer.vscode-eslint"
			]
		}
	},
	"forwardPorts": [10000, 10001, 10002],
  "postCreateCommand": "git config --global --add safe.directory /workspace && poetry self add poetry-plugin-export && poetry config warnings.export false && poetry config virtualenvs.in-project true --local && poetry install --with dev --no-interaction --sync --no-root"
}

Next steps

You can connect to the locally emulated storage using a connection string based on a well-known account and key baked in to Azurite:

DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1;QueueEndpoint=http://127.0.0.1:10001/devstoreaccount1;TableEndpoint=http://127.0.0.1:10002/devstoreaccount1;

You might choose to add more advanced configuration, such as persistent storage, or enabling OAuth (to allow the use of the DefaultAzureCredential).

Finally, you can add additional services, such as PostgreSQL or Mongo, to aid your development process.

How to: Create SAS with multiple permissions in Pulumi

In Pulumi, when calling pulumi_azure_native.storage.list_storage_account_service_sas_output() to generate a SAS, you pass the required permissions to the permissions: Input[str | Permissions | None] parameter. pulumi_azure_native.storage.Permissions is an enum, offering simple selections (R, L etc.):

# Create a shared access signature scoped to the container
app_container_signature = (
  pulumi.Output.all(resource_group.name, storage_account.name, app_container.name)
  .apply(
    lambda args: azure_native.storage.list_storage_account_service_sas_output(
      resource_group_name=args[0],
      account_name=args[1],
      protocols=azure_native.storage.HttpProtocol.HTTPS,
      shared_access_start_time="2022-01-01",
      shared_access_expiry_time="2030-01-01",
      resource=azure_native.storage.SignedResource.C,
      permissions=azure_native.storage.Permissions.R,
      content_type="application/json",
      cache_control="max-age=5",
      content_disposition="inline",
      content_encoding="deflate",
      canonicalized_resource=f"/blob/{args[1]}/{args[2]}",
    )
  )
  .apply(lambda result: pulumi.Output.secret(result.service_sas_token))
)

But you can also pass a string of permissions, any of R, L, D, W, C, A, or P, depending on the actions you want to allow for the SAS. This allows you to specify permissions for reading (R), listing (L), deleting (D), writing (W), creating (C), adding (A), or processing (P) blobs within the container, such as permissions="RWL":

app_container_signature = (
  pulumi.Output.all(resource_group.name, storage_account.name, app_container.name)
  .apply(
    lambda args: azure_native.storage.list_storage_account_service_sas_output(
      resource_group_name=args[0],
      account_name=args[1],
      protocols=azure_native.storage.HttpProtocol.HTTPS,
      shared_access_start_time="2022-01-01",
      shared_access_expiry_time="2030-01-01",
      resource=azure_native.storage.SignedResource.C,
      permissions="RWL",
      content_type="application/json",
      cache_control="max-age=5",
      content_disposition="inline",
      content_encoding="deflate",
      canonicalized_resource=f"/blob/{args[1]}/{args[2]}",
    )
  )
  .apply(lambda result: pulumi.Output.secret(result.service_sas_token))
)

How to: retrieve storage account primary key using Pulumi

Another note for myself. I wanted to use this to give my app access to the entire account. I thought they would be a property of pulumi_azure_native.storage.StorageAccount but they’re not. Instead you need to call pulumi_azure_native.storage.list_storage_account_keys().

import pulumi
import pulumi_azure_native as azure_native
config = pulumi.Config()
# Create a Resource Group
resource_group_name = config.require("resourceGroupName")
location = config.require("location")
resource_group = azure_native.resources.ResourceGroup(
  resource_group_name, resource_group_name=resource_group_name, location=location
)
# Create a Storage Account
storage_account = azure_native.storage.StorageAccount(
  config.require("storageAccountName"),
  resource_group_name=resource_group.name,
  sku=azure_native.storage.SkuArgs(
    name=azure_native.storage.SkuName.STANDARD_LRS,
  ),
  kind=azure_native.storage.Kind.STORAGE_V2,
  location=resource_group.location,
)
# fetch primary key
storage_account_primary_key = (
  pulumi.Output.all(resource_group.name, storage_account.name)
  .apply(lambda args: azure_native.storage.list_storage_account_keys(resource_group_name=args[0], account_name=args[1]))
  .apply(lambda accountKeys: pulumi.Output.secret(accountKeys.keys[0].value))
)

How to: check if a container exists without account level List permission

In a storage account, you can create a SAS scoped to a specific container, however, that SAS does not have permission to execute BlobClient.exists() as this requires at least list privileges on the parent object (e.g. the account), and when you try to perform the check, you’ll get this error:

azure.core.exceptions.HttpResponseError: This request is not authorized to perform this operation.

Note that this is an HttpResponseError, not a ClientAuthenticationError (which is actually a more specific error which extends HttpResponseError), so you need to interrogate the specific response code, although note that if the container client does not exist, you might also get a ClientAuthenticationError with this message:

Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:a52b0bdf-401e-0030-755f-fc3a0f000000\nTime:2024-09-01T11:09:19.0123125Z\nErrorCode:AuthenticationFailed\nauthenticationerrordetail:Signature did not match. String to sign used was rwl\n2022-01-01T00:00:00.0000000Z\n2030-01-01T00:00:00.0000000Z\n/blob/<account>/<container>\n\n\nhttps\n2015-04-05\nmax-age=5\ninline\ndeflate\n\napplication/json

As a workaround, you can instead try to read from or write to (depending on the privilege level you want to test for) the container. For example, you might do something like this:

def test_if_container_exists(container_client: ContainerClient) -> bool:
  """check if a container exists first by calling the .exists() method, even if the SAS scope is restricted to the specific container"""
  # first use the .exists() method to check if the container exists
  try:
    container_client.exists()
    return True
  except ClientAuthenticationError:
    # we dont know if this is because the SAS token doesnt have access or the container doesnt exist
    pass
  except HttpResponseError as e:
    if e.status_code == 404:
      return False
    elif e.status_code == 403:
      pass  # could be a SAS token scope restriction
  except ResourceNotFoundError:
    return False
  except Exception as e:
    # if we got any other exception, raise it
    raise
  # if we got ClientAuthenticationError, try to write a small blob to check if the SAS token is valid
  try:
    blob_client = container_client.get_blob_client("test_blob")
    blob_client.upload_blob(b"Test content", overwrite=True)
    return True
  except ClientAuthenticationError or ResourceNotFoundError:
    return False
  except Exception as e:
    # if we got any other exception, raise it
    raise