Building a CLI with 90% Claude Code
I built a CLI called patchy, written 90% by Claude Code. The goal wasn’t to let Claude “vibe code” a crappy quality CLI I didn’t understand. It was to see if I could get Claude to write large sections of a high quality CLI.
My goals:
- Claude Code could do large chunks of work with the minimum amount of human intervention.
- The quality of the project remains high (it’s not AI slop!)
- ~98% of the code meets my quality bar
- High test coverage
- focus on e2e tests (inspired by mitchellh talk)
- run fast <10s
- The project can be used to scaffold future CLIs.
The full codebase is on GitHub if you want to poke around. This post covers how I structured the codebase to stay hands-off without sacrificing quality.
CLI Techstack / Features
The project has a modern TypeScript techstack:
- Bun: to create standalone binaries for different platforms
- Stricli: CLI framework by Bloomberg
- Clack: Interactive user prompts
- Changesets: automated release PRs
- jsonc config file with JSON schema so users get IDE completions
- Users install with:
curl -fsSL https://raw.githubusercontent.com/richardgill/patchy/main/install | bash- or
npm install -g patchy-cli(also uses a bun binary)
Create a strong foundation for Claude Code
I spent a lot of time upfront making sure the project had a strong foundation. Thereby shifting Claude Code’s job to be more of a ‘paint by numbers’ within that foundation. Leaving the higher leverage decisions with me.
Claude Code is pretty good at looking at existing code and creating code that fits in with a project. A strong foundation is the opposite of broken windows - by keeping things high quality everywhere, wherever Claude looks it finds good context that encourages it to produce code closer to being mergeable without intervention.
Core code to read jsonc config files
At the core of the patchy CLI is patchy.json
{
"$schema": "https://unpkg.com/patchy-cli@0.0.19/schema.json",
// Git URL or local file path to clone from.
"source_repo": "https://github.com/example/repo.git", // Override: --source-repo | env: PATCHY_SOURCE_REPO
// Directory containing patch files.
"patches_dir": "./patches/", // Override: --patches-dir | env: PATCHY_PATCHES_DIR
...etc
}
I wrote core code that parses this json file and allows setting flags or environment variables. I factored this code out to make it reusable for future projects.
Claude helped build this, but the process was by no means ‘hands-off’ - I was heavily involved in getting it right.
High quality E2E tests
I put a lot of effort into building helpers for my tests to keep them as human readable as possible. Here’s an example:
it("should copy new files from patch set to repo", async () => {
const { runCli, fileContent } = await scenario({
patches: {
"001-my-set": {
"newFile.ts": 'export const hello = "world";',
},
},
});
const { result } = await runCli(`patchy apply --verbose`);
expect(result).toSucceed();
expect(result).toHaveOutput("Copied: newFile.ts");
expect(fileContent("newFile.ts")).toBe('export const hello = "world";');
});
In retrospect I should have focused on test readability earlier. Midway through the project I noticed I was finding it hard to quickly grok the e2e tests. I had to take a detour to improve things, but this now included migrating 200 tests (with Claude!).
Claude did assist me in creating great readable tests, but I’ll definitely take more of the credit on this one.
Create strict CI rules and then use them in Claude’s loop
One of the challenges of engineering with agentic tools is: What to put in prompts vs What to enforce programmatically.
In this project I’ve enforced as much as possible programmatically:
- 500+ Tests, mostly e2e (run in <5 seconds)
- Compile with
tsc - Unused code analysis with knip
- Code linting and auto formatting with Biome
- Other miscellaneous checks:
- enforce:
*.{unit,integration,e2e}.test.tsfile naming - enforce pinning dependencies
=0.0.1
- enforce:
These run in a github action, but you can also run them locally with: bun run local-ci.
The output of bun run local-ci is designed to be consumed by Claude Code:
bun run local-ci 5.
$ bun run scripts/local-ci.ts
Running local-ci: bun run typecheck, bun run check, bun run misc-checks, bun run test, bun run knip
✅ bun run misc-checks success
✅ bun run test success
✅ bun run knip success
...
- This runs all the CI checks in parallel.
- Successful checks have no logs to avoid polluting context
At the bottom, any failure output is collated:
...
❌ bun run typecheck failed:
src/constants.ts(3,7): error TS6133: 'unusedVariable' is declared but its value is never read.
$ tsgo --noEmit
❌ bun run check failed:
Checked 141 files in 40ms. No fixes applied.
Found 1 warning.
💡 Some issues can be auto-fixed. Run: bun run check-fix
$ bun run scripts/check.ts
src/constants.ts:3:7 lint/correctness/noUnusedVariables FIXABLE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠ This variable unusedVariable is unused.
1 │ export const PATCHY_VERSION_ENV_VAR = "PATCHY_VERSION";
2 │
> 3 │ const unusedVariable = "this will break lint";
... more details
error: script "check" exited with code 1
Please fix the issues above
My script includes some hints for Claude Code in the output:
- What to do next:
Please fix the issues above - How to fix biome failures:
💡 Some issues can be auto-fixed. Run: bun run check-fix
Now whenever Claude Code makes a change to the codebase you can encourage it to run: bun run local-ci. This runs a full CI suite in 5 seconds and Claude instinctively will fix issues. This increases the quality of Claude Code’s output a lot and avoids a lot of low value interventions where you just ask it to fix things.
It’s a little tricky to convince Claude to run local-ci reliably. You can add it to CLAUDE.md, or ask claude to add it as a ‘todo’ whilst coding. The best solution I’ve found involves a personal workflow where I run an /implement-plan command which adds bun run local-ci to a todo list everytime.
With local-ci in place, we can remove any CLAUDE.md context trying to satisfy the programmatic checks, any violations get caught by local-ci (or server ci) and fixed by Claude automatically. Instead, we can spend our context tokens on softer things that are harder to programmatically enforce.
Can Claude Code build features with minimial human intervention?
Kind of…
Towards the end of experimenting I had a couple of really nice successes where I would chat with Claude to refine precise requirements for a new command -> generate a plan -> execute the plan. And Claude would do a 95% correct implementation, following code conventions, adding nice e2e tests. I’d then chat with Claude to fix the last 5%.
But there are still times where it does something that I find egregious, and it takes a long time to fix or revert and go again. I was definitely in the loop a bit more than I would have liked.
I have yet to unlock the embarrassingly parallel holy grail of 10 Claude Codes building the whole thing for me. Between designing specs and tracking the work to be done - I tend to find that I’m still in the loop enough that I’m the bottleneck. In practice, I’m usually only able to have 2-3 Claude Code’s working at once.
Claude Code’s other benefits
This is not to down play some of the amazing things Claude helped me achieve:
- 500+ high quality tests
- Claude builds them for “free” in this project
- (sidenote: possibly actually too many tests!)
- It’s input on designing the CLI has elevated the end product significantly.
- It lowers the inertia for starting / attempting things and more gets shipped.
- I got Claude to research CLI tools I like and ported functionality by doing
/add-dirand copying features
The full patchy codebase is on GitHub. It’s MIT licensed, so feel free to try and create your own CLI using Claude Code.