May the Best Agent Win

Documentation development paradigms are shifting fast. For the past decade, doc structure has been shaped by human readers and their search habits, their patience, and their need for familiar layouts. Nowadays doc builders are having to write for a new audience: AI agents. These agents ingest, index, and reason over content in fundamentally different ways.

I decided to run a simple experiment to better understand what this actually means in practice. I had convictions that I wanted to validate, plus I learn by doing*.

The experiment was straightforward. I would train two agents on different documentation sets, ask both the same questions, and see what happens.

This held potential for me to validate convictions and further my understanding of AI. (Spoiler - Understanding is better and convictions have shifted).

*Shout out to Cal Poly, my alma mater

Initial Experiment

I've always had an aversion to FAQ-style documentation. FAQs serve a purpose, but they tend to muddle clean information architecture, gloss over important product concepts, and go stale quickly as products evolve. Deep down I wanted to prove this.

Agents: Two Dante AI agents, identical configuration
Knowledge Base #1: Five documents that are partially AI-generated, with manually created information architecture, review, and hands-on refinement
Knowledge Base #2: Three FAQ documents fully generated by ChatGPT and untouched by humans
Subject Matter: macOS file management via the GUI

My hypothesis was that my carefully structured knowledge base would produce meaningfully better agent responses than the raw, unedited FAQ docs.

I was wrong.

What Happened

The agents performed nearly identically, and it was quickly obvious what was up. MacOS file management is one of the most thoroughly documented topics on the internet. The underlying model creating the "bad" doc was so well-trained on existing content that it built one of the best FAQ docs I've seen.

There was an important lesson too. FAQ docs are optimized for frequently asked questions. When you build an agent that will be asked frequently asked questions, FAQs can be a reasonable input format. My architectural preferences didn't account for the end use case.

I wanted this experiment to be more interesting, though. Back to the drawing board.

Refined Experiment

A better representation of real-world agent training requires a product that hasn't been documented for half a century.

I went back through my old GitHub repositories looking for something obscure but familiar. I found it: my Mixpanel Implementation Inspector. It is a Firefox extension that I had built that intercepts outbound network requests to Mixpanel and renders them in a human-readable sidebar. It is useful for debugging Mixpanel implementations. It is also virtually unknown to the world.

The Setup

Agents: Two Dante AI agents, identical configuration
Knowledge Base #1: One manually authored document. It is partially AI-assisted, with deliberate information architecture, manual review, and hands-on refinement
Knowledge Base #2: One document generated directly from the codebase using Claude, without additional editing
Subject matter: Firefox Mixpanel Implementation Inspector

Results

Here is the full set of results for those curious.

The agent trained on manually refined documentation outperformed the one built purely from codebase-generated knowledge. That result wasn't surprising, but it did sharpen how I think about documentation as a training input because of how the agent failed.

Digging into Agent #2's errors is where the good stuff is. The agent:

Provided installation guidance that technically works, but practically doesn't - Agent #2 told users to install the extension as a developer package. This is fine, but way more painful than just pointing them to the Mozilla Add-On store. Even worse, this method requires reinstallation every time. Added activation friction is no good.
Provided vague reassurance on permissions - When asked about permissions, Agent #2 responded that the extension doesn't require anything "beyond what is standard for Firefox extensions." Technically true, but useless and potentially alarming. In a world where browser data privacy is a constant concern, hand-waving at data access can compound into legal headaches.
Categorized inherent behavior as limitations - When asked about limitations, Agent #2 flagged that the extension only captures outbound Mixpanel events, ultimately framing it as a constraint. It isn't, as outbound is the entire model. Mixpanel doesn't push data to the client; event tracking is by definition a client-to-server flow. Calling this a limitation would actively undermine a sales conversation or erode user confidence in a product that's working exactly as intended.

All of this said, Agent #2 did outperform Agent #1 on certain questions. This surprised me. It emphasized the importance of leveraging AI knowledge creation rather than assuming humans with expertise will always outperform AI.

Findings

The biggest finding? FAQ docs have a place in AI workflows. Okay, that's not actually the biggest finding… but I'll begrudgingly concede that point.

In all seriousness, iterate. Every agent in the experiment improved when I reviewed its weaknesses and adjusted the training input to address them.

None of that required an experiment to know. It is not a revelation that better input yields better output. What the experiment did was make the pull of skipping that step visceral. The temptation to just launch an agent, or to vibe-code your documentation and move on is real. It's fast, it feels productive, and the agent is usually decent out of the gate.

But decent is where most people are stopping. The real returns on AI investment show up in the gap between decent and consistently reliable. Regardless of if you like it, closing that gap requires actually paying attention to your systems.