What is Claude Code context management and why does it matter?

Claude Code has a finite context window shared between system prompts, tools, and your actual work. System prompts and tools consume about 9.8%, autocompaction reserves 22.5%, leaving roughly 67% before you write any code. If your project's always-loaded rules consume too much of that remaining space, output becomes less predictable and autocompaction disrupts flow more frequently.

What is the hot/cold separation pattern for Claude Code configuration?

Hot context (constraints) defines properties that must hold consistently — data access patterns, styling rules, TypeScript strictness. These stay always-loaded because they need to survive compaction. Cold context (procedures) describes step-by-step workflows like creating components or structuring tests. These load on-demand via skills, keeping the always-loaded footprint small.

How do Claude Code skills work as progressive disclosure?

Skills are on-demand context that loads only when invoked. Detailed workflows, templates, and examples that used to live in always-loaded rules move into skills. When a skill is invoked, it loads into the context window. When it's not needed, it consumes zero tokens. The tradeoff is explicit invocation — if a skill isn't referenced, Claude Code won't have that procedural detail.

How much context can you save by restructuring CLAUDE.md?

In this case, always-loaded rules dropped from approximately 12,000 tokens to 2,000 tokens — an 83% reduction. This was achieved by moving step-by-step instructions, code examples, templates, and how-to sections into skills, while keeping only core architectural constraints, non-negotiable patterns, and explicit anti-patterns in always-loaded rules.

What should stay in always-loaded rules vs. move to skills?

Keep in always-loaded rules: core architectural constraints, non-negotiable patterns (like 'never use client components for SEO-critical content'), explicit anti-patterns, and links to detailed procedures. Move to skills: step-by-step setup instructions, code examples and templates, workflow procedures, and how-to sections. If guidance defines boundaries, it stays. If it describes steps, it moves.

Running Out of Context: Constraints Stay Hot and Workflows Load Cold

When was the last time you ran /context while using Claude Code?

If you haven't, it's worth doing at least once. It changes how you think about the tool.

System prompts and tools already consume about 9.8% of the context window. Claude Code's autocompaction buffer reserves another 22.5%. Before writing a single line of code, that leaves roughly 67% of the context budget for actual work. That's the baseline — and it's smaller than most people expect.

After introducing Claude Code infrastructure and guardrails into our codebase, I checked /context again. Under 50%.

I hadn't started writing code yet.

when context becomes a constraint

As the context window filled up, behavior became less predictable. The clearest signal wasn't outright failure — it was timing.

I was outlining a plan in the conversation: the structure of the change, the sequencing, the decisions I was about to implement. Mid-thought, the conversation compacted. The summary was serviceable, but some of the working detail didn't survive. When things picked back up, I had to re-anchor ideas that usually just carried through on their own. Nothing broke. Autocompaction was doing what it's designed to do. But the workflow developed a drag — this low-grade friction where I found myself repeating things, re-establishing assumptions I thought were settled, spending more time reviewing output than I'd planned for.

Sometimes I noticed it physically. I was sipping my coffee faster than usual, trying to stay ahead of the drift. My shoulders were tighter. The session had this quiet urgency to it that I couldn't quite name until I stepped back and looked at the context meter.

If you don't treat context as infrastructure, AI-assisted development quietly taxes human working memory. And that cost shows up as velocity loss… but also just as a worse afternoon.

how we ended up there

Like many early adopters of Claude Code, we built structure: CLAUDE.md and rules for guardrails and alignment. Skills didn't exist yet.

We documented patterns. We explained conventions. We wrote guardrails meant to reduce ambiguity — the kind of careful, thorough documentation that feels responsible while you're writing it. Over time, our CLAUDE.md, rules, commands, and supporting files grew past 3,400 lines. Quite a bit of infrastructure for a tool that's supposed to help you move fast.

The intent was clarity. The side effect was context pressure. It didn't take long before the context window was crowded and output became unpredictable. Humanlayer's guide to writing a good CLAUDE.md helped us get started, but understanding what actually belongs in always-loaded context — that came from experience. From watching sessions drift. From running /context one too many times and not liking what we saw.

That's when we stepped back and measured what was actually consuming context.

separating constraints from workflows

What helped most was separating two kinds of guidance we'd been treating as interchangeable.

Constraints describe properties that must hold consistently — data access happens outside components, styling flows through the theme system, tests use a single framework, TypeScript is strict. They reduce the solution space. They need to survive compaction.

Procedures describe workflows — creating components, writing hooks, styling layouts, structuring tests. They're useful when relevant, but they don't need to be present all the time.

One early mistake was treating component creation steps as constraints. Folder structure and setup instructions didn't need to survive compaction. What mattered was respecting data boundaries, theming, and test structure. Once we made that distinction (and honestly, it took longer than it should have), the question became straightforward: what actually needs to be always loaded?

what we cut, what we kept

We removed step-by-step setup instructions, code examples and templates, workflow procedures, and "how-to" sections — anything that described a process rather than a boundary.

We kept core architectural constraints, non-negotiable patterns, explicit anti-patterns, and links to where detailed procedures live. The stuff that defines the shape of the system, save for the procedures that fill it in.

Detailed workflows moved into skills.

skills as progressive disclosure

Skills are on-demand context. They contain what used to live in always-loaded rules: examples, templates, and step-by-step guidance. When a skill is invoked, it loads. When it's not, it doesn't consume tokens.

Over time, the habit formed: when starting new work, invoke the relevant skill up front. Just as importantly, I learned to prompt Claude Code to check its available toolkits — and to verify that any other tools or agents involved in the workflow were also referencing the skills they needed to do their job. Treating skills as shared infrastructure (not personal memory) turned out to be essential once more than one agent or automation entered the loop.

The tradeoff is explicit invocation. If a skill isn't referenced, Claude Code won't have that procedural detail. Early on, that led to output that followed constraints but missed project-specific workflows. It's a different kind of friction — less mysterious than context pressure, but still something you learn to anticipate.

the results

After restructuring, we measured the impact directly.

Always-loaded rules dropped from roughly 12k tokens to 2k — an 83% reduction. Core constraints stayed well within the context window, and Claude Code had quite a bit more room to reason about the task at hand. Skills and memory files (which the rules consumed before) now take about 0.4% of the context window. That difference is meaningful when you're trying to stay in flow.

The workflow didn't change much. Context pressure did. Autocompaction became less disruptive. Output stayed aligned longer. Review overhead decreased as patterns stabilized. And watching the context meter became a planning signal rather than a background worry — something I could glance at and feel calm about, instead of that creeping tension of wondering when the next compaction would hit.

As the codebase evolves, the constraints have remained stable. The skills continue to grow. That asymmetry is what makes this sustainable.

what we gave up

Skills don't reload automatically after compaction. If a session compacts mid-task, they need to be re-invoked.

For teams, this requires alignment. Everyone needs to know which skills exist and when to use them. That's more overhead than "everything is always loaded," but the context savings make it worthwhile. We're still figuring out the best way to communicate this across the team — it's one of those things where the pattern is clear but the habit takes time to stick.

where linting fits

Some enforcement doesn't belong in context at all. Formatting, linting, and test execution run through hooks. They don't rely on prompt memory or conversation state. Moving enforcement out of context entirely freed more space for planning and judgment — the work that actually benefits from a wider window.

seeing the context cost

Before restructuring, the always-loaded portion of our context looked roughly like this:

always-loaded context

Component	Tokens
CLAUDE.md	~340
Rules (6 files)	~3,500
Total	~3,840

Nearly 4,000 tokens were allocated before application code entered the picture.

after restructuring

Rules condensed dramatically once constraints and procedures were separated:

File	Before	After	Reduction
CLAUDE.md	~1.5k	~400	-73%
architecture.md	~1.6k	~300	-81%
react-patterns.md	~1.2k	~250	-79%
typescript-rules.md	~886	~200	-77%
testing-standards.md	~999	~150	-85%
security-guidelines.md	~1.2k	~300	-75%
styling-standards.md	~4.6k	~400	-91%
Total	~12k	~2k	~83%

That reduction kept core constraints well within the context window and reclaimed space for actual reasoning and flow work.

if you're running into this

Run /context tomorrow. Find your biggest always-loaded file.

Ask: is this defining boundaries, or describing steps? If it defines boundaries that must hold consistently, it stays. If it describes steps that apply sometimes, it moves. Start there.

Frequently Asked Questions

Common questions about the concepts covered

closing thought

AI-assisted development makes context a shared resource.

Once you treat it that way, certain tradeoffs become clearer. Reducing always-loaded context didn't make the system smarter. It made the workflow more predictable — gave the sessions this steadier quality where I wasn't bracing for the next compaction, wasn't holding extra things in my head just in case.

Day to day, that mattered more.