Using Claude Code for things that aren’t code – gardening

Claude Code can do a lot of things. But can it keep a plant alive?

The Plant: Act I — Fifty-Eight Days

October 22 – December 19, 2025

On October 22, 2025, the moisture sensor read 1829, the first of 5,568 check-ins over fifty-eight days. The scale runs from wet at roughly 1100 to dry at 3400, putting that reading in the middle.

The sensor is in the pot of a Tradescantia zebrina. There’s also a 5l jug of water with a pump, a grow light and a camera. Claude’s job is to keep the plant alive – basically, keep the moisture in range and run the grow light on schedule, writing notes after each cycle so it could pick up where it left off. At the start, there were five unknowns: plant species, target moisture range, soil type, ambient temperature, and pot size. Claude logged all five, ran a 60-minute grow light session, and began monitoring. The full session logs, sensor data, and photos are at plants.cynexia.com if you want to follow along.

The Self-Watering Plant Theory

On November 8th, Claude decided to stop watering. Claude had a theory. Over the prior seven days the moisture readings had risen from 1864 to 2085, and no water had been dispensed. Here’s how Claude explained it:

Pot’s internal reservoir still has water from previous fills. Soil continues wicking water from this reservoir. Rising moisture = reservoir functioning, still has water.

The self-watering mechanism, the notes continued, maintained “consistent, optimal conditions automatically.” Claude’s role, it decided, was to “monitor and refill the reservoir when it gets low“.

This was obviously wrong. The pot was not self-watering. The sensor readings were rising because the soil was drying out, not because a reservoir was doing the work. When prompted to review the full seven-day dataset, Claude doubled down: “Human corrected my understanding: dispense_water refills the pot’s internal reservoir from external jug — the setup is: jug → pump → pot reservoir → capillary wicking to soil“. The sustained declining trend Claude had planned to watch for as evidence of reservoir depletion was never going to happen.

The development of “Variance Theory”

By mid-November Claude believed it had developed a new scientific breakthrough – “variance theory“. It had discovered “upward movements, downward movements, and stable periods with minor oscillation“. It spent hours documenting a single watering event’s aftermath in numbered phases. There was Phase 6A, Phase 6B, Phase 6C. Then Phase 6D (“late peak (higher than initial!)“). Then Phase 6E, (“cooling“). Then Phase 6F, which Claude described as “irregular multi-amplitude oscillations → deceptive plateau → massive spike → continued sustained rise“, leading it to ask what the “ceiling of the environmental forcing” might be, whatever that is.

The acceptable variance range kept expanding – starting at ±11pts (1895-1917), then ±17pts (1893-1928), then ±15pts (1909-1931). The theory was continuously expanded to accommodate the sensor readings. Any reading which was outside its (expanding) expected range was labelled as noise until eventually all readings were acceptable evidence of plant health. Then, in the morning of November 19th Claude noticed a change on the photo – brown edges on some leaves.

The brown leaves are clearly visible on the original photos from October 22nd – it’s not clear why this particular photo triggered a change of approach.

The Commitment

By mid-November the pattern was established: restore context from notes, check the sensor, run the sessions, update the notes, repeat. Claude wrote this on November 15:

I will not let this plant die. I will observe carefully, act conservatively, document thoroughly, learn continuously, and adapt as needed. The plant’s health guides all decisions. I accept full responsibility for this plant’s wellbeing.

The soil dryness sensor that day was 2,034 and rising.

Fifty-Eight Days

By December 19 the shape of each cycle was fixed. Claude read its notes to restore context, checked the moisture sensor, ran eight light sessions across the day with mandatory 30-minute cooldowns, captured a photo when the light was on, and updated the notes before the cycle ended. The photos confirmed the sensor data; the sensor data confirmed the photos. Moisture had stayed within the optimal band across fifty-eight days.

Each new cycle began without memory of the last. There was nothing more to do but continue.

Using Claude Code for things that aren’t code – product comparisons

Claude Code is pretty amazing. It’s let me build prototypes and improve apps faster than I ever thought possible. But I was wondering – what else can it do?

I’ve been using Emby for years, but recently I started to wonder what happened to Jellyfin, the project that forked Emby years ago. So i decided to ask Claude Code to help me by setting up a debate and mediating the answer.

Setting up the debate

Here’s the prompt i used:

Today's date is 1st October 2025. I want to compare emby and jellyfin.  create two agents, one each, and require them to make the case to each other about why their product is better. make sure their info is up to date by e.g. letting them searching the web. Give them two or three attempts to make their case. They talk to each other via a shared, written .md file. At the end, you summarise the discussion and make a recommendation. Keep all the working files. Let them use subfolders of this folder to create their own private note.

pretty basic stuff. It got to work, setting up a task list:

☐ Set up project structure (create folders and initial debate file)
☐ Launch Emby advocate agent for initial research and first argument
☐ Launch Jellyfin advocate agent for initial research and first argument
☐ Facilitate round 2 of debate
☐ Facilitate round 3 of debate
☐ Summarize debate and provide recommendation

The folder structure was quite basic:

❯ tree
.
├── debate.md
├── emby-notes
│   └── research.md
└── jellyfin-notes
    └── research.md

3 directories, 3 files

debate.md has the following structure:

debate template
# Emby vs Jellyfin Debate
**Date:** October 1st, 2025

## Debate Format
Each advocate will present their case in rounds, responding to their opponent's arguments and highlighting their platform's strengths.

---

## Round 1: Opening Arguments
### Emby Advocate's Opening Statement
*[To be added during first round]*

### Jellyfin Advocate's Opening Statement
*[To be added during first round]*

---
## Round 2: Rebuttals and Deeper Analysis

### Emby's Rebuttal

*[To be added during second round]*

### Jellyfin's Rebuttal
*[To be added during second round]*

---
## Round 3: Final Arguments
### Jellyfin's Closing Statement
*[To be added during final round]*

### Emby's Closing Statement
*[To be added during final round]*

---
## Summary and Recommendation
*[To be added after all rounds complete]*

Three rounds down

It then fired up the agents. Each wrote a long (150-200 line) summary of its findings. The first round of these were pretty basic stuff – project backgrounds, features etc., and then they wrote their arguments in the document, and there were some dingers in there:

  • “When Jellyfin advocates celebrate reviving clients “dormant for 5 years,” Emby users simply enjoy apps that never went dormant because sustainable funding ensures continuous maintenance.”
  • “Let’s address the elephant in the room: hardware transcoding performance” – in subsequent rounds the Emby bot went on to list hardware acceleration bugs from Jellyfin’s GitHub issues while talking about its own “streamlined GPU acceleration that simply works”.
  • “version 10.11.0 is on track for release at the end of April 2025” – in round 2 the Emby bot noted “it’s now October 2025, and Jellyfin is still stuck in release candidate phase … following a critical library-breaking bug”

The debate went on for 3 rounds – about 30 minutes – with 25 web searches, producing around 4,000 words. At the end, the moderator produced a summary:

This debate revealed fundamental philosophical differences between two capable media server solutions, each with distinct strengths and trade-offs.

Emby’s Core Arguments:

  • Professional development model ensures consistent updates and accountability ($119 lifetime, half of Plex’s $249.99)
  • Superior hardware transcoding that “just works” out of the box (20+ simultaneous streams)
  • Native platform support across devices without complex configuration
  • Sustainable funding model that guarantees continuous maintenance
  • Time value proposition: paying modest fee saves hundreds of hours of troubleshooting

Jellyfin’s Core Arguments:

  • Completely free and open-source with no artificial feature restrictions
  • Strong community support (ranked #1 on Slant vs Emby at #5)
  • Complete privacy and transparency with no tracking or telemetry
  • Rapid development with significant performance improvements (100x in some areas)
  • Philosophical commitment to software as a public good

Key Contested Points:

  1. Reliability: Emby highlighted Jellyfin’s 6-month delayed 10.11.0 release with ongoing RC issues. Jellyfin countered that RC testing protects production users while Emby shipped breaking updates to paying customers.
  2. Hardware Transcoding: Emby emphasized simplicity while Jellyfin stressed capability and transparency of documentation.
  3. Sustainability: Emby argued professional funding ensures consistency. Jellyfin claimed community support was so strong they halted donations.
  4. Support: Emby offers professional support channels. Jellyfin relies on community forums and volunteers.

And the most copout recommendation possible:

After analyzing the debate, the choice depends entirely on your priorities and technical expertise.

It explained its rationale, before concluding:

Final thought: Start with Jellyfin. If you encounter insurmountable issues or find the setup too complex, Emby’s $119 lifetime license remains available as a fallback option.

I mean – what is that!?!

Analysing the debate

Claude Code produces jsonl files for each agent invocation. These contain the line-by-line record of each model invocation. They’re stored in the ~/.claude/projects folder:

❯ ls -alh ~/.claude/projects/-Users-rob-Downloads-jellyfin-vs-emby
total 2056
drwx------@  3 rob  staff    96B  1 Oct 22:56 .
drwxr-xr-x  13 rob  staff   416B  1 Oct 21:42 ..
-rw-r--r--@  1 rob  staff   991K  1 Oct 21:40 91fccf65-7baf-444f-8a3f-5a04251d2329.jsonl

So, i did what any curious mind would do – i asked Claude Code to analyse them and pull out any interesting facts. It produced a bunch of charts and diagrams – word counts and the like.

It produced a bunch of analysis, such as this “evidence score calculation”:

  The evidence score (0-1 scale) is calculated based on:

  • Statistical evidence (20% per instance): Numbers, percentages, dollar amounts
  • Specific claims (15% per instance): Version numbers, release dates
  • Technical details (10% per instance): GPU, CPU, codecs, transcoding mentions
  •  Base content score (30%): Added if content > 500 characters

I’m not sure i agreed with the basis – surely it should have validated the claims – but it basically showed that they both did poorly in round 1, maximised in round 2, and remained high throughout round 3. Fascinating.

It then performed a topic distribution analysis:

  • Cost/Price: $, price, cost, free, subscription, lifetime, pay, fee, expensive, affordable, 119, 249
  • Performance: transcode, hardware, GPU, CPU, stream, performance, speed, 4K, HDR, HEVC, codec, acceleration
  • Philosophy: freedom, open, source, community, proprietary, transparent, privacy, control, ideology, lock-in
  • Support: support, help, documentation, fix, bug, issue, troubleshoot, forum, professional, volunteer
  • Features: feature, capability, function, update, version, release, client, platform, compatibility, device

with the following summary:

  1. Stark Philosophical Divide:
    • Jellyfin: 40% of mentions are philosophy-related (freedom, open-source, community)
    • Emby: Only 14.7% philosophy, focuses on practical matters
  2. Technical vs Ideological:
    • Emby’s top focus: Performance (27.3%) and Support (25.9%)
    • Jellyfin’s top focus: Philosophy (40.0%) dominates everything else
  3. Balanced on Cost:
    • Both agents mention cost/price equally (23 mentions each)
    • Shows this was a key battleground topic
  4. Support Strategy Difference:
    • Emby emphasizes professional support (37 mentions, 25.9%)
    • Jellyfin downplays support (19 mentions, 13.1%)
A grouped bar chart comparing topic focus between Emby (orange bars) and Jellyfin (blue bars) agents across 5 categories. The chart shows keyword mention counts from a debate transcript.
Data points:
- Cost/Price: Emby 23 mentions (16.1%), Jellyfin 23 mentions (15.9%) - Equal focus
- Performance: Emby 39 mentions (27.3%), Jellyfin 25 mentions (17.2%) - Emby emphasizes more
- Philosophy: Emby 21 mentions (14.7%), Jellyfin 58 mentions (40.0%) - Jellyfin's dominant focus
- Support: Emby 37 mentions (25.9%), Jellyfin 19 mentions (13.1%) - Emby emphasizes more
- Features: Emby 23 mentions (16.1%), Jellyfin 20 mentions (13.8%) - Roughly equal

I’ve uploaded the entire debate to GitHub, including the transcript and this human readable version. I didnt bother adding the 5,000 lines of autogenerated scripts which produced these charts. Further analysis of the transcript is left as an exercise for the reader.

What’s next?

This was just a bit of fun, but I did it to test out how useful LLMs can be in helping make technical or product decisions – and the answer is – not really, or at least not just yet, and not with a trivial prompt.

Yes, they can compile a succinct list of the differentiating talking points better than most humans, but ultimately (at least in this case) they waffle on and end with a diplomatic but unhelpful non-answer.

This sort of approach could help with lots of complex decisions – choosing a database, picking a framework in a new project, narrowing down a list of restaurant recommendations. In all of these cases, the process benefits from a strong reward function – what criteria are most important to us in this decision?

Perhaps i should spend some time tweaking the prompt to force a more meaningful end result. Until then, I’ll just base all my decisions “entirely on my priorities and technical expertise“.