Klarna's AI agent saved the company $60 million. It handled 2.3 million conversations in its first month and cut resolution times from eleven minutes to two. It did the work of 853 full-time employees. Then Klarna's CEO went on Bloomberg and admitted the strategy had backfired — and started hiring humans back.
The AI didn't fail. That's the disorienting part. The AI worked exactly as designed. It optimized for speed and cost reduction, and it delivered both. What it didn't deliver was what Klarna actually needed: customer relationships that made people stay.
This is the most important unsolved problem in enterprise AI right now. Not whether agents can do the work — they can. But whether they can do the work in a way that serves what your organization actually values.
Prompt, context, intent — and the gap between them
So if the AI worked as designed, what exactly went wrong? Two thinkers published frameworks this month that answer that question — and their conclusions point to the same architectural gap.
Nate Jones framed the evolution cleanly in his breakdown of the Klarna case: prompt engineering told AI what to do. Context engineering tells AI what to know. Intent engineering tells AI what to want. Almost nobody is building for that third layer yet.
Paweł Huryn published an Intent Engineering Framework that puts structure around this idea. His core argument: agents don't fail because they can't reason. They fail because their objectives, outcomes, and constraints are underspecified. Intent is what determines how an agent acts when instructions run out.
Both are right. And they're converging on the same insight from different angles — Jones from enterprise strategy, Huryn from product methodology. But there's a layer underneath that neither fully addresses: the infrastructure that makes intent engineering possible in the first place.
Why $60M in savings broke Klarna's CX
Klarna's story isn't about AI failure. It's about AI succeeding at the wrong thing.
The company deployed an OpenAI-powered customer service agent in early 2024. Two-thirds of all chats automated. Average resolution under two minutes. Projected $40 million in annual savings. The metrics looked spectacular — until they didn't.
Forrester analyst Kate Leggett called it an "overpivot to cost containment, without thinking about the longer-term impact of customer experience." By May 2025, CEO Sebastian Siemiatkowski was publicly reversing course and rehiring.
The number that tells the real story: even with $60 million in AI savings, Klarna's customer service costs rose from $42 million to $50 million year-over-year. The AI created more problems on complex cases than it solved — driving repeat contacts, escalations, and customer churn that ate through the savings and then some.
As Jones puts it: organizations have solved "can AI do this task?" and completely failed to solve "can AI do this task in a way that serves our organizational goals?"
The knowledge that never got codified
Huryn identifies the root cause precisely: when intent is incomplete, humans draw on knowledge that was never written down. Agents can't. They only know what you've codified. And most organizations have codified far less than they think.
This is the gap we see every day at gutt. Teams configure agents with clear objectives, but the strategic context those agents need lives in people's heads — decisions from a meeting six months ago, unwritten rules about edge cases, lessons from a failed deployment nobody documented.
Klarna's AI didn't know when a customer was frustrated about a recurring billing issue versus confused about a one-time charge. It didn't know about the team's decision to be more lenient with long-term customers. It didn't carry any of the institutional knowledge that experienced reps rely on — because that knowledge was never captured in a form the AI could access.
Huryn's framework identifies seven components of well-defined intent: Objective, Desired Outcomes, Health Metrics, Strategic Context, Constraints, Decision Types & Autonomy, and Stop Rules. Of these, Strategic Context — where the agent sits in the system, the business model, the organizational norms — is both the most important and the hardest to get right. In most organizations, it's fragmented across Slack threads, meeting recordings, and the institutional memory of senior team members. When those people leave, the context leaves with them.
No intent engineering framework can survive that. No amount of prompt optimization will compensate for it. You can write the most precise intent specification in the world — if the organizational context it depends on doesn't exist in a form agents can access, the specification is incomplete.
Context without intent is noise. Intent without context is guesswork.
Huryn quotes Shopify CEO Tobi Lütke on context engineering: the fundamental skill of using AI well is stating a problem with enough context that the task is plausibly solvable. Then Huryn adds his sharp reframe — context without intent is noise.
We'd add the inverse: intent without context is guesswork. And guesswork at scale is exactly what happened at Klarna.
Jones identifies three missing layers in his analysis: unified context infrastructure, coherent AI workflow architecture, and organizational alignment frameworks. Strip away the labels and all three point to the same absence — there's no system that captures what an organization knows, decides, and values in a form that AI agents can use at runtime.
Intent and context aren't competing ideas. They're two halves of the same system. Intent defines what the agent should optimize for. Context gives it the material to optimize with. Klarna had neither.
How a memory layer closes the intent gap
At gutt, we build organizational context infrastructure — a knowledge graph that captures decisions, lessons, working agreements, and institutional knowledge from the tools your team already uses. It connects to AI agents through MCP (Model Context Protocol), so any agent — Claude, Copilot, ChatGPT, Cursor — can access organizational truth without custom integrations.
Here's how that maps to the intent engineering problem:
Objectives become grounded. When an agent knows your strategic priorities, product positioning, and past decisions, it can interpret objectives in the right context. "Increase customer satisfaction" means something different at a startup shipping fast versus an enterprise managing compliance. Klarna's agent didn't have this distinction — it optimized for speed because that's all it could measure.
Health metrics have baselines. Organizational memory contains the history of what worked and what didn't. Agents with access to past incidents and lessons learned can recognize when they're about to violate a health metric before it happens. Klarna's agent had no concept of "satisfaction is trending down on complex cases" because it had no memory of past interactions in context.
Constraints become discoverable. Instead of manually encoding every constraint into every agent's configuration, organizational context makes constraints emergent. The agent knows that the team decided to never auto-close tickets without customer confirmation — not because you hard-coded it, but because that decision is part of the organizational memory.
Decision autonomy is informed by precedent. When an agent encounters a situation it hasn't been explicitly briefed on, it can check how similar situations were handled before. This is the difference between Klarna's blunt automation and the kind of nuanced service that keeps customers.
Why static intent specifications always decay
Most teams approach agent configuration as a one-time setup: define the objectives, write the constraints, deploy. But organizations change. Strategies shift. New policies get introduced. Lessons get learned.
The agent you configured in January doesn't know about the product pivot you decided in February or the customer complaint pattern you identified in March. This is why MIT reports 95% of generative AI pilots fail to deliver measurable impact — the pilots work in isolation, then break when they meet organizational reality.
A memory layer makes intent dynamic. The organizational context updates continuously as your team works — every decision captured, every lesson stored, every policy change reflected. Your agents stay aligned with organizational reality without manual reconfiguration.
The shift from methodology to infrastructure
Jones, Huryn, and a growing chorus of practitioners are describing a real shift. The industry is moving past "make the prompt better" and into "make the system smarter." That's the right direction.
But methodology alone doesn't scale. Frameworks tell you what questions to answer. Infrastructure gives your agents the answers at runtime.
Prompt engineering was the warm-up. Context engineering is where the industry is now. Intent engineering is where it's headed. And the infrastructure that makes all three work — the organizational memory that captures what your company knows, decides, and values — is what separates the 5% of AI deployments that deliver real impact from the 95% that don't.
Intent engineering is the design practice. Organizational memory is the runtime layer. You need both. One without the other gives you either well-designed agents that lack context, or context-rich systems without clear direction. Either way, you get Klarna.
Don't be the next Klarna
If you're building with AI agents, start with the question Jones raises: do your agents actually know what you're optimizing for? Then ask Huryn's follow-up: have you specified the objectives, outcomes, constraints, and strategic context your agents need?
Then ask the question almost nobody asks: where does that strategic context actually live? Is it in a document someone wrote once, or is it continuously captured from how your team operates? When a key team member leaves, do your agents lose the context they need? Can your agents access organizational history, or are they starting from zero every conversation?
Klarna spent $60 million learning that AI without organizational memory optimizes for the wrong thing. gutt is the layer that makes sure your agents optimize for what actually matters.
Your AI agents are only as good as the context they can access. Book a demo before your organization learns this the expensive way.
