Solving Advent of Code with Claude Code
Claude Code solved all 12 days of Advent of Code 2025 in under 2 hours. I solved the same 12 days in… considerably longer.
Advent of Code (AOC) is a set of Santa-themed programming puzzles released each December. In 2023 I attempted to solve it using GPT 3.5 and GPT 4 - back then, GPT 3.5 solved 1-2 days, and GPT 4 managed an impressive 6.
This year, each morning, I’d solve the puzzle alone like a 2021 caveman - I’d then ask Claude Code to solve it for me. I use it daily - it’s an impressive tool - but I was still surprised by just how well it performed:
Claude Code solved all 12 days with high quality answers (often better than mine) and no human intervention. Taking under 2 hours sequentially, or 15-40 minutes in parallel.
Jump to the results or watch the video.
Warning: This post includes minor spoilers to Advent of Code 2025 questions.
Note: Eric, the brilliant creator of these problems, recommends not using AI to solve Advent of Code to get the most out of it.
Step 1: Copy and Pasting in and out of Claude Code
I set up a small Bun + TypeScript code base with all the days laid out and some empty tests.
The first day of Advent of Code is usually reasonably straight forward (it escalates in difficulty each day). This year’s was a little bit harder than most. Day 1 took me maybe 30-45 minutes (I was rusty!).
I pasted day 1 part 1 into claude code. It solved it in 2 minutes, immediately jumping to the correct, perfect, solution. Day 1 part 2: another 2 minute solve, also perfect. The solutions were clearer than mine.
The next few days were the same. Each day Claude would solve the puzzles in a few minutes, usually in a single attempt! Occasionally it wouldn’t do the best job of cleaning up its solution unless I asked it to, and I was still copy and pasting between adventofcode.com and Claude Code to grab the puzzles and verify solutions.
This got me thinking: Could I automate solving a day perfectly?
Step 2: Solving a day with the /solve-day command
To solve a day perfectly requires automating talking with adventofcode.com:
- Fetching puzzles from adventofcode.com
- Submitting solutions to adventofcode.com
Luckily there’s aoc a CLI for Advent of Code which I could use to automate this.
Next I wrote the /solve-day command:
Create these todos:
[ ] Run `./scripts/download-input.sh $ARGUMENTS 1` then `./scripts/read-puzzle.sh $ARGUMENTS 1`
[ ] Solve the part per CLAUDE.md approach (create example files, implement solution). Check if answer is correct: `./scripts/check-answer.sh $ARGUMENTS 1`
[ ] Review `src/days/$ARGUMENTS.1/*.ts` for clarity and readability.
## Guidelines
- Write clean, idiomatic code
- Don't use unusual syntax to work around style rules
- If it helps readability, extract small helper functions when:
- Logic is reused
- A function exceeds ~15 lines
- It makes the main function's intent clearer
- Prefer declarative patterns (`.filter().length`) over imperative loops when equally readable
- Only add comments for non-obvious logic
<...same repeated for part 2>
CLAUDE.md
## Runtime
Use Bun instead of Node.js.
- `bun <file>` to run a file
- Run `bun local-ci` to run lint, tsc and tests
Part 1 and part 2 solutions are in separate folders (e.g., `src/days/01.1/`, `src/days/01.2/`).
## Solving
Create example1.txt/example2.txt from the problem's examples.
Be persistent and solve problems autonomously. If a solution doesn't work, keep trying different approaches. Debug failures, fix errors, and iterate until solved.
This command solves all the days of Advent of Code 2025. Easy days take ~4 minutes, hard days take 7-18 minutes. The answers are typically high quality. I’ve yet to see it fail to solve a day.
My next thought: Is Claude somehow cheating?
Step 3: Adding anti-cheat
To make sure Claude wasn’t cheating I used the .claude/settings.json permissions feature:
{
"permissions": {
"deny": [
"WebSearch",
"WebFetch",
"Read(README.md)",
"Read(./scripts/**)",
"Read(./mise.toml)",
"Bash(aoc:*)"
]
}
This prevents Claude Code from searching the web and also from seeing how any of the mechanics of using the aoc command line work or calling it directly.
The result: It wasn’t cheating. So on to the obvious next step: Can we solve the entire year in one command?
Step 4: solve-year.sh
Solving a year was reasonably straight forward. It runs claude "/solve-day 1" in a for loop up to 12. This gives each day a fresh context window to work with.
Here is a full video of Claude Code solving all 12 days:
The results
You can look at the solutions and conversations in this github repo.
Claude solved all 12 days. This isn’t a fluke, I’ve never seen it fail to complete a day. My fancy commands give higher quality solutions, but it always seems to solve the problems just fine.
The quality of the solutions is very good, for most days it’s equal or better than a pretty smart human. The solutions it picks are usually one of the ‘best’ solutions, not a hacky or inefficient one. There are some nice small details in its solutions, for example, it often uses Big numbers to avoid integer overflow (presumably a common gotcha).
Days 1-8: one shot solutions
As the days of Advent of Code went by I really began questioning how Claude was quite as good as it was. Its ability to solve days 1-8 was superhuman. It would read the puzzle, think for 30 seconds and crank out a brilliant, near perfect, answer. Completely trivializing the 30, 45, 90, cough 150 minutes I’d spent solving the puzzle.
By day 5, my squishy human feelings were bruised, so I began interrogating Claude:
“Reflect: Have you seen similar problems before? Name specific problems you’ve seen. Is this problem from a specific ‘class’ of problem you’ve seen? How did you come to your solution?”
This reveals what’s really going on: Claude has indeed has seen many similar problems in it’s training set. It responds with answers like this one for Day 8:
“Advent of Code 2023 Day 25 - Graph connectivity problem requiring finding minimum cuts. Similar in that it deals with graph components.”… “Minimum Spanning Tree (MST) / Kruskal’s Algorithm - This is essentially Kruskal’s algorithm for building a minimum spanning tree. Sort edges by weight, then greedily add edges that connect different components using Union-Find.”
Although Advent of Code 2025 is after its knowledge cut-off date, it has strong ‘memories’ of previous Advent of Code years and similar LeetCode problems and it adapts them for an immediate solve with no debugging and no real ‘problem solving’.
Contrast this with my attempts: My competitive programming days are behind me, and my memory is definitely not ‘strong’. Each day is like I’m solving these problems for the first time. No wonder it takes me 30x longer!
I think one of the reasons Claude Code is so strong here is that competitive programming is a common benchmark that LLMs are measured on, and it’s likely they’re featured prominently in the training sets. Which perhaps was not the case back in 2023 GPT which solved 6/25 days (there used to be 25 days).
Days 9-12: human-like iteration
The harder, more novel 4-5 days this year Claude Code looks more human. Claude often takes multiple attempts to find the right solution, and the route there looks like a (faster) version of what I do when things get tough.
The initial attempt is usually off the mark, but Claude iterates its way to a solution eventually solving the day.
Along the way you’ll see it use debugging, write unit tests, squashing edge cases. The process looks a lot more human.
The stats
| Day | Duration | Messages | Tokens | Tools | Link |
|---|---|---|---|---|---|
| 1 | 3m24s | 47 | 4,565 | 38 | View |
| 2 | 6m03s | 56 | 8,644 | 47 | View |
| 3 | 4m55s | 55 | 6,477 | 47 | View |
| 4 | 5m20s | 52 | 6,653 | 44 | View |
| 5 | 4m49s | 55 | 6,090 | 46 | View |
| 6 | 5m09s | 52 | 9,247 | 44 | View |
| 7 | 7m12s | 56 | 8,394 | 47 | View |
| 8 | 8m33s | 71 | 14,866 | 63 | View |
| 9 | 13m53s | 79 | 18,307 | 69 | View |
| 10 | 12m10s | 66 | 15,493 | 58 | View |
| 11 | 4m08s | 55 | 6,523 | 46 | View |
| 12 | 38m50s¹ | 89 | 39,381 | 78 | View |
| TOT | 1:54:26 | 733 | 144,640 | 627 | |
| AVG | 9:32 | 61 | 12,053 | 52 | |
| MIN | 3:24 | 47 | 4,565 | 38 | |
| MAX | 38:50 | 89 | 39,381 | 78 |
These stats are impressive, Claude solves the whole year in under 2 hours. If you parallelize, and spend some $$ on tokens, you can solve the whole year in under 40 minutes.
You can see the time, messages and tokens jump up towards the end as the problems get harder.
¹ Day 12 got stuck in a loop with a command that took 2 mins to run per time. Other runs tended to take <10 mins.
Conclusions
Claude’s superhuman at LeetCode, but not ‘normal’ code
Claude’s performance in Advent of Code is impressive, but it doesn’t match my day to day experience coding with it and comes with some caveats:
Advent of Code problems are variations based around ‘classes’ of problem that Claude is extensively trained on.
Problems are self contained with all the context provided directly in the question and they don’t require common sense. This is rare in real life coding in many domains.
The code base and context required is small, and no ‘outside’ knowledge is needed: Training Set + Puzzle is everything it needs.
Problems solutions are right or wrong and can be checked. Real life is often messier than this.
Claude Code’s agentic loop makes a big difference
Days 1-8 are impressive, but when things get harder in day 9+ the agentic loop really comes into play. Claude Code is able to iterate to answers trading tokens and time to solve the puzzle.
Some of the debugging and reasoning looks remarkably human.
Claude’s breadth of knowledge is impressive
It’s remarkable how Claude is able to read a puzzle littered with mentions of Elves, Santa and other unnecessary information and pattern match it to a class of problem it’s seen in the training set and crank out a perfect answer. This is a core strength of the underlying models.
This way of working is the same as an elite LeetCode puzzle solver like Jonathon Paulson solves problems. They’ve studied and practiced and are domain experts in the common types of problems.
But what makes Claude’s breadth so incredible is that it has this level of depth on thousands of topics, like: Rust internals, or filing a tax return. This breadth of knowledge is truly a superhuman capability.