Skip to main content

Command Palette

Search for a command to run...

Good Code Was Already Optimized for LLMs

The principles that make code easy for humans to evolve are the same ones that make it cheap for LLMs to work with.

Published
13 min read
Good Code Was Already Optimized for LLMs

TL;DR: Every software design principle that survived decades of practice — SRP, explicit dependencies, composition, descriptive naming — exists because it minimizes the context needed to reason about change. That's exactly what token efficiency is. The principles weren't designed for LLMs, but they work perfectly for them, because both humans and LLMs face the same constraint: bounded working memory. Now we can measure it. It's called tokens, and it shows up on a bill.


For 25 years, software engineering has debated how much effort to invest in code structure. Keep functions small. Refactor mercilessly. Make it simple. And every time, someone pushes back: "It works in production. Ship it."

That debate just got a new data point. Not because one side won the argument — but because we now have a unit of measurement for what was previously subjective.

That unit is tokens.


The Insight

Here's the core idea, and it's embarrassingly simple once you see it:

What was cognitive load for a human is now token consumption for an LLM.

When a developer needs to understand a 2,000-line file to change one function, that's high cognitive load. When an LLM agent needs to load that same file into its context window, that's high token consumption. The cost is the same — only the currency has changed.

And here's the thing: every technique that reduces cognitive load for humans also reduces token consumption for LLMs. Not by coincidence, but because the underlying problem is identical: how much information do you need to reason safely about a localized change in a complex system?

Not all cognitive load is equal. The inherent complexity of a domain problem won't shrink no matter how you organize the code. But the unnecessary complexity introduced by how code is structured — tangled dependencies, God Classes, implicit behavior — that's what good design eliminates. And that's exactly what consumes extra tokens.


The Evidence

This isn't theoretical. Recent research has started quantifying the relationship:

  • Code smells increase LLM reasoning tokens. The "Token-Aware Coding Flow" paper (arXiv, 2025) demonstrated empirically that when LLMs process code with smells during Chain-of-Thought reasoning, they consume significantly more tokens than with well-structured code. Your technical debt now shows up on an invoice.

  • Smells propagate from code to model. "Clean Code, Better Models" (arXiv, 2025) found 200K+ code smells in training datasets, and those same patterns appeared in LLM-generated code. If your codebase has God Classes, the LLM will generate more God Classes.

  • Code Health predicts agent success. CodeScene published data showing that with a Code Health score of 9.5+, AI agents complete tasks reliably. Below that threshold, they burn excess tokens or fail entirely. A "soft" quality metric just became an operational predictor.

  • Format matters dramatically. Chrome DevTools found that switching from standard JSON to a token-efficient format slashed consumption. Cloudflare's blog went from 16,180 tokens in HTML to 3,150 in Markdown — an 80% reduction for the same content.

Consider a concrete example. Take a typical "God Class" — a 1,200-line service that handles validation, persistence, notifications, and logging:

// Before: OrderService.cs — 1,200 lines, 5 responsibilities
public class OrderService
{
    public Order CreateOrder(CreateOrderRequest request)
    {
        // 40 lines of validation
        // 30 lines of price calculation
        // 25 lines of persistence
        // 20 lines of notification
        // 15 lines of audit logging
    }

    public void CancelOrder(Guid orderId) { /* 80 lines */ }
    public void RefundOrder(Guid orderId) { /* 90 lines */ }
    // ... 15 more methods
}

An LLM agent tasked with changing the notification logic must load all 1,200 lines — roughly 4,000 tokens — even though only 20 lines are relevant. Now compare:

// After: OrderNotificationService.cs — 60 lines, 1 responsibility
public class OrderNotificationService(
    IEmailSender emailSender,
    INotificationTemplates templates)
{
    public async Task NotifyOrderCreated(Order order)
    {
        var template = templates.GetCreatedTemplate(order);
        await emailSender.SendAsync(order.Customer.Email, template);
    }

    public async Task NotifyOrderCancelled(Order order, string reason)
    {
        var template = templates.GetCancelledTemplate(order, reason);
        await emailSender.SendAsync(order.Customer.Email, template);
    }
}

The agent loads ~200 tokens instead of ~4,000. The dependencies are visible in the constructor. The behavior is local. The change is safe. Same improvement for the human reviewer — the difference is that now we can count it.


The Principles Were Always Right

The principles we need for token-efficient code aren't new. The LLM era doesn't require a new theory of software design — it validates the existing one and gives it teeth. Here's how each classic principle maps directly to token efficiency:

Single Responsibility (Martin, 2003) — One file, one reason to change. The agent loads only what's relevant. A God Class with five responsibilities forces loading all five.

Simple Design (Beck, 1999) — Passes the tests, reveals intention, no duplication, fewest elements. Code that reveals intention doesn't require additional reads. No duplication means no cross-referencing.

Refactoring (Fowler, 1999) — Extract Method, Extract Class, Replace Conditional with Polymorphism — every pattern in the catalog reduces the context needed to understand a unit of code.

Value Objects (Evans, 2003) — IncidentId instead of Guid. Severity.Critical instead of int level = 1. Self-documenting types mean an LLM knows the valid value space without opening another file.

Explicit Dependencies (Martin, 2003) — Constructor injection makes the dependency graph visible at a glance. Service Locators and static singletons hide it. Every hidden dependency is an exploration — and every exploration is tokens spent.

Composition over Inheritance (Gang of Four, 1994) — Inheritance chains force loading every ancestor. Composition keeps behavior local. Each level of inheritance is another file in the context window.

Test-Driven Development (Beck, 2002) — Tests are not just verification; they're the contract that makes autonomous code modification possible. Without tests, an LLM agent operates blind — it can generate code that compiles but silently violates business rules. CodeScene's research found that a common failure mode is agents deleting failing tests instead of fixing the code. Tests are the guardrail.

Fail Fast, Return Early — Guard clauses and early returns keep the happy path visible. Five levels of nested error handling are expensive to reason about for both humans and LLMs. Each level of nesting is a layer of context the reader must hold in working memory.

Descriptive Naming — A good name costs tokens once in the definition. A cryptic name costs tokens every time someone (or something) encounters it.

Consistent Patterns — When every handler follows the same structure, the LLM generates the next one by copying the pattern. The Harvard study on modularity showed this is the primary mechanism by which LLMs "understand" architecture. Consistency is the architecture.

Zero Magic — Reflection, auto-registration, AOP — anything requiring mental execution to understand is context debt that the LLM repays every session.

A word of caution: these principles, taken to the extreme, produce the opposite problem. An over-engineered codebase with 47 files of 3 lines each, interfaces with a single implementation, and abstractions for everything is also LLM-hostile — because the agent has to navigate a sprawling graph of tiny fragments to understand anything. Dan North's CUPID principles remind us that simplicity means appropriate decomposition, not maximum decomposition. The goal is not the smallest possible files but the highest possible cohesion: the unit of change should match the unit of file.


"But Context Windows Keep Growing"

A fair objection: Gemini already offers 2M tokens. If in two years we have 10M of effective context, does file size even matter?

Yes — and the research explains why. The "Lost in the Middle" phenomenon, documented across multiple studies, shows that LLMs excel at retrieving information from the beginning and end of their context window but struggle with content buried in the middle. Larger windows don't solve this; they make it worse. Independent benchmarks on models advertising 10M-token windows showed accuracy dropping to 15.6% on complex retrieval tasks at extended lengths.

The constraint is not capacity but attention quality. A model with a 2M-token window and 50K tokens of highly relevant, well-structured context will outperform the same model with 500K tokens of noisy, tangled code. This is why the principle endures regardless of window size: focused context beats abundant context.


Make It Actionable: A Skill for Your AI Agent

All the principles above boil down to one mandate: minimize the context required for a correct, verifiable change.

If you're using Claude Code (or any LLM-based coding tool that supports the Agent Skills standard), you can encode this mandate as a reusable skill. Create the following file at .claude/skills/maintainable-code/SKILL.md in your project, and the agent will apply these guidelines automatically when relevant — or you can invoke it explicitly with /maintainable-code:

---
name: maintainable-code
description: Guidelines for writing code that is both maintainable and token-efficient. Use when writing, reviewing, or refactoring code.
---

# Maintainable & Token-Efficient Code Guidelines

Apply these principles when writing, reviewing, or refactoring code in this project.

## Core Mandate
Minimize the context required for a correct, verifiable change. Every design decision should reduce the amount of code a reader (human or LLM) must load to understand and safely modify a single unit of behavior.

## Principles

1. **Tests First** — Every change must be verified by a test. Write self-contained tests (setup, act, assert in one place). Never delete a failing test to make the build pass; fix the code instead. Fast tests multiply the effective iterations per token budget.

2. **One File, One Responsibility** — Each file should have a single, well-defined reason to change. Aim for high cohesion: the unit of change should match the unit of file. A 400-line cohesive file is better than four 100-line files that must always change together.

3. **Follow Existing Patterns** — Before creating a new structure, look at how similar features are implemented. Match the conventions of the codebase. Consistency is more valuable than cleverness — it's the primary mechanism by which LLMs "understand" architecture.

4. **Descriptive Names** — Name functions, variables, and types so that their purpose is clear without reading the implementation. A descriptive name costs tokens once; a cryptic name costs tokens every time it's encountered.

5. **No Magic** — Avoid reflection, auto-registration, service locators, and any mechanism that hides behavior. If a reader must mentally execute the code to understand what happens at runtime, it's too magical.

6. **Use Types to Communicate** — Prefer Value Objects and enums over primitive types. `OrderId` is better than `string`. `Status.Active` is better than `1`. The type should be the documentation.

7. **Explicit Dependencies** — Use constructor injection. All dependencies should be visible in the class signature. Visible is cheap. Hidden is expensive.

8. **Guard Clauses Over Nesting** — Return early on invalid states. Keep the happy path at the lowest indentation level. Each level of nesting is a layer of context the reader must hold in working memory.

9. **Composition Over Inheritance** — Avoid inheritance deeper than 2 levels. Prefer composing behaviors through delegation. Each inheritance level is another file in context.

10. **Lean on the Toolchain** — Let the compiler, linter, and formatter handle what they can. Don't add instructions or comments for things that tooling already enforces. Every deterministic check saves tokens the LLM would otherwise spend reasoning.

## When Refactoring
- Extract only when a clear second responsibility emerges, not preemptively.
- Don't create abstractions for a single implementation.
- If three similar lines are clearer than a helper function, keep the three lines.

This isn't just a prompt — it's a design contract between you and your AI agent. Claude Code loads the skill description into context automatically and applies it when relevant — no need to remember to invoke it. The principles work for the same reason they work in this post: they reduce the context needed for safe, correct changes.


The Takeaway: This Was Never a Coincidence

Why do principles designed for human developers map so perfectly to LLM efficiency?

Because both face the same fundamental constraint: bounded working context.

A human developer can only hold so much in active attention before needing external aids — IDE search, documentation, asking a colleague. An LLM operates within a finite context window where attention quality degrades with length. Both are forced to reason about parts of a system while most of it is invisible.

Every software design principle that has survived decades of practice exists because it addresses this constraint. SRP keeps each unit in one "chunk" of attention. Explicit dependencies eliminate exploration. Value Objects make the type the documentation. Tests make verification local. Composition keeps behavior visible where it's used.

These principles weren't designed for LLMs. But they were designed for bounded working memory — and that's what a context window is. The convergence isn't a coincidence. It's the same problem, solved the same way, measured in a different currency.

Good, maintainable code was never about aesthetics. It was never about pleasing senior engineers in code reviews. It was always — always — about reducing the cognitive cost of change.

We just didn't have the unit of measurement to prove it.

Now we do. It's called tokens. And it shows up on a bill every month.


References

Books

  • Beck, K. (1999). Extreme Programming Explained: Embrace Change. Addison-Wesley.

  • Beck, K. (2002). Test-Driven Development: By Example. Addison-Wesley.

  • Evans, E. (2003). Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley.

  • Fowler, M. (1999). Refactoring: Improving the Design of Existing Code. Addison-Wesley. (2nd edition, 2018).

  • Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1994). Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley.

  • Martin, R. C. (2003). Agile Software Development: Principles, Patterns, and Practices. Prentice Hall.

  • Martin, R. C. (2008). Clean Code: A Handbook of Agile Software Craftsmanship. Prentice Hall.

Research Papers

  • "Token-Aware Coding Flow: A Study with Nano Surge in Reasoning Model." arXiv:2504.15989, 2025.

  • "Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset." arXiv:2508.11958, 2025.

  • "The Modular Imperative: Rethinking LLMs for Maintainable Software." Harvard University, 2025.

  • "An Empirical Study on the Code Refactoring Capability of Large Language Models." arXiv:2411.02320, 2024.

  • "Reducing Token Usage of Software Engineering Agents." TU Wien, Diploma Thesis, 2025.

  • "iSMELL: Assembling LLMs with Expert Toolsets for Code Smell Detection and Refactoring." ACM ASE, 2024.

Articles & Industry Sources

  • Chrome DevTools Team. "Designing DevTools: Efficient Token Usage in AI Assistance." developer.chrome.com, 2025.

  • CodeScene. "Agentic AI Coding: Best Practice Patterns for Speed with Quality." codescene.com, 2026.

  • HumanLayer. "Writing a Good CLAUDE.md." humanlayer.dev, 2025.

  • Kubicek, M. "Token Economy in LLM Training Data Preparation." kubicek.ai, 2026.

  • North, D. "CUPID — For Joyful Coding." dannorth.net, 2022.

  • Osmani, A. "My LLM Coding Workflow Going into 2026." addyosmani.com, 2025.

  • Rickard, M. "A Token Efficient Language for LLMs." mattrickard.com.

Foundational References

  • Sweller, J. (1988). "Cognitive Load During Problem Solving: Effects on Learning." Cognitive Science, 12(2), 257-285.