technology – Just what i find

Claude Code is pretty amazing. It’s let me build prototypes and improve apps faster than I ever thought possible. But I was wondering – what else can it do?

I’ve been using Emby for years, but recently I started to wonder what happened to Jellyfin, the project that forked Emby years ago. So i decided to ask Claude Code to help me by setting up a debate and mediating the answer.

Setting up the debate

Here’s the prompt i used:

Today's date is 1st October 2025. I want to compare emby and jellyfin.  create two agents, one each, and require them to make the case to each other about why their product is better. make sure their info is up to date by e.g. letting them searching the web. Give them two or three attempts to make their case. They talk to each other via a shared, written .md file. At the end, you summarise the discussion and make a recommendation. Keep all the working files. Let them use subfolders of this folder to create their own private note.

pretty basic stuff. It got to work, setting up a task list:

☐ Set up project structure (create folders and initial debate file)
☐ Launch Emby advocate agent for initial research and first argument
☐ Launch Jellyfin advocate agent for initial research and first argument
☐ Facilitate round 2 of debate
☐ Facilitate round 3 of debate
☐ Summarize debate and provide recommendation

The folder structure was quite basic:

❯ tree
.
├── debate.md
├── emby-notes
│   └── research.md
└── jellyfin-notes
    └── research.md

3 directories, 3 files

debate.md has the following structure:

debate template

# Emby vs Jellyfin Debate
**Date:** October 1st, 2025

## Debate Format
Each advocate will present their case in rounds, responding to their opponent's arguments and highlighting their platform's strengths.

---

## Round 1: Opening Arguments
### Emby Advocate's Opening Statement
*[To be added during first round]*

### Jellyfin Advocate's Opening Statement
*[To be added during first round]*

---
## Round 2: Rebuttals and Deeper Analysis

### Emby's Rebuttal

*[To be added during second round]*

### Jellyfin's Rebuttal
*[To be added during second round]*

---
## Round 3: Final Arguments
### Jellyfin's Closing Statement
*[To be added during final round]*

### Emby's Closing Statement
*[To be added during final round]*

---
## Summary and Recommendation
*[To be added after all rounds complete]*

Three rounds down

It then fired up the agents. Each wrote a long (150-200 line) summary of its findings. The first round of these were pretty basic stuff – project backgrounds, features etc., and then they wrote their arguments in the document, and there were some dingers in there:

“When Jellyfin advocates celebrate reviving clients “dormant for 5 years,” Emby users simply enjoy apps that never went dormant because sustainable funding ensures continuous maintenance.”
“Let’s address the elephant in the room: hardware transcoding performance” – in subsequent rounds the Emby bot went on to list hardware acceleration bugs from Jellyfin’s GitHub issues while talking about its own “streamlined GPU acceleration that simply works”.
“version 10.11.0 is on track for release at the end of April 2025” – in round 2 the Emby bot noted “it’s now October 2025, and Jellyfin is still stuck in release candidate phase … following a critical library-breaking bug”

The debate went on for 3 rounds – about 30 minutes – with 25 web searches, producing around 4,000 words. At the end, the moderator produced a summary:

This debate revealed fundamental philosophical differences between two capable media server solutions, each with distinct strengths and trade-offs.

Emby’s Core Arguments:

Professional development model ensures consistent updates and accountability ($119 lifetime, half of Plex’s $249.99)

Superior hardware transcoding that “just works” out of the box (20+ simultaneous streams)

Native platform support across devices without complex configuration

Sustainable funding model that guarantees continuous maintenance

Time value proposition: paying modest fee saves hundreds of hours of troubleshooting

Jellyfin’s Core Arguments:

Completely free and open-source with no artificial feature restrictions

Strong community support (ranked #1 on Slant vs Emby at #5)

Complete privacy and transparency with no tracking or telemetry

Rapid development with significant performance improvements (100x in some areas)

Philosophical commitment to software as a public good

Key Contested Points:

Reliability: Emby highlighted Jellyfin’s 6-month delayed 10.11.0 release with ongoing RC issues. Jellyfin countered that RC testing protects production users while Emby shipped breaking updates to paying customers.

Hardware Transcoding: Emby emphasized simplicity while Jellyfin stressed capability and transparency of documentation.

Sustainability: Emby argued professional funding ensures consistency. Jellyfin claimed community support was so strong they halted donations.

Support: Emby offers professional support channels. Jellyfin relies on community forums and volunteers.

And the most copout recommendation possible:

After analyzing the debate, the choice depends entirely on your priorities and technical expertise.

It explained its rationale, before concluding:

Final thought: Start with Jellyfin. If you encounter insurmountable issues or find the setup too complex, Emby’s $119 lifetime license remains available as a fallback option.

I mean – what is that!?!

Analysing the debate

Claude Code produces jsonl files for each agent invocation. These contain the line-by-line record of each model invocation. They’re stored in the ~/.claude/projects folder:

❯ ls -alh ~/.claude/projects/-Users-rob-Downloads-jellyfin-vs-emby
total 2056
drwx------@  3 rob  staff    96B  1 Oct 22:56 .
drwxr-xr-x  13 rob  staff   416B  1 Oct 21:42 ..
-rw-r--r--@  1 rob  staff   991K  1 Oct 21:40 91fccf65-7baf-444f-8a3f-5a04251d2329.jsonl

So, i did what any curious mind would do – i asked Claude Code to analyse them and pull out any interesting facts. It produced a bunch of charts and diagrams – word counts and the like.

It produced a bunch of analysis, such as this “evidence score calculation”:

The evidence score (0-1 scale) is calculated based on:

Statistical evidence (20% per instance): Numbers, percentages, dollar amounts

Specific claims (15% per instance): Version numbers, release dates

Technical details (10% per instance): GPU, CPU, codecs, transcoding mentions

Base content score (30%): Added if content > 500 characters

I’m not sure i agreed with the basis – surely it should have validated the claims – but it basically showed that they both did poorly in round 1, maximised in round 2, and remained high throughout round 3. Fascinating.

It then performed a topic distribution analysis:

Cost/Price: $, price, cost, free, subscription, lifetime, pay, fee, expensive, affordable, 119, 249

Performance: transcode, hardware, GPU, CPU, stream, performance, speed, 4K, HDR, HEVC, codec, acceleration

Philosophy: freedom, open, source, community, proprietary, transparent, privacy, control, ideology, lock-in

Support: support, help, documentation, fix, bug, issue, troubleshoot, forum, professional, volunteer

Features: feature, capability, function, update, version, release, client, platform, compatibility, device

with the following summary:

Stark Philosophical Divide:

Jellyfin: 40% of mentions are philosophy-related (freedom, open-source, community)

Emby: Only 14.7% philosophy, focuses on practical matters

Technical vs Ideological:

Emby’s top focus: Performance (27.3%) and Support (25.9%)

Jellyfin’s top focus: Philosophy (40.0%) dominates everything else

Balanced on Cost:

Both agents mention cost/price equally (23 mentions each)

Shows this was a key battleground topic

Support Strategy Difference:

Emby emphasizes professional support (37 mentions, 25.9%)

Jellyfin downplays support (19 mentions, 13.1%)

A grouped bar chart comparing topic focus between Emby (orange bars) and Jellyfin (blue bars) agents across 5 categories. The chart shows keyword mention counts from a debate transcript.
Data points:
- Cost/Price: Emby 23 mentions (16.1%), Jellyfin 23 mentions (15.9%) - Equal focus
- Performance: Emby 39 mentions (27.3%), Jellyfin 25 mentions (17.2%) - Emby emphasizes more
- Philosophy: Emby 21 mentions (14.7%), Jellyfin 58 mentions (40.0%) - Jellyfin's dominant focus
- Support: Emby 37 mentions (25.9%), Jellyfin 19 mentions (13.1%) - Emby emphasizes more
- Features: Emby 23 mentions (16.1%), Jellyfin 20 mentions (13.8%) - Roughly equal

I’ve uploaded the entire debate to GitHub, including the transcript and this human readable version. I didnt bother adding the 5,000 lines of autogenerated scripts which produced these charts. Further analysis of the transcript is left as an exercise for the reader.

What’s next?

This was just a bit of fun, but I did it to test out how useful LLMs can be in helping make technical or product decisions – and the answer is – not really, or at least not just yet, and not with a trivial prompt.

Yes, they can compile a succinct list of the differentiating talking points better than most humans, but ultimately (at least in this case) they waffle on and end with a diplomatic but unhelpful non-answer.

This sort of approach could help with lots of complex decisions – choosing a database, picking a framework in a new project, narrowing down a list of restaurant recommendations. In all of these cases, the process benefits from a strong reward function – what criteria are most important to us in this decision?

Perhaps i should spend some time tweaking the prompt to force a more meaningful end result. Until then, I’ll just base all my decisions “entirely on my priorities and technical expertise“.

Tag: technology

Using Claude Code for things that aren’t code – product comparisons

Setting up the debate

Three rounds down

Analysing the debate

What’s next?