
In early 2024, Klarna made headlines around the world by announcing that its new AI chatbot had done in one month what 700 full-time customer service employees used to do. Response times had fallen from 11 minutes to under 2 minutes. The chatbot handled 2.3 million conversations across 35 languages. The company’s CEO, Sebastian Siemiatkowski, was celebrated in tech media as a leader who had made the future real. By mid-2025, Klarna had quietly started hiring those workers back.
The full story of what happened between those two moments is more instructive than either the triumphant press release or the embarrassed reversal. It is a case study in how companies measure the wrong things when they deploy AI, and why the gap between an AI productivity headline and an AI productivity reality can cost you your customers.
This matters well beyond Klarna. According to a January 2026 analysis by Harvard Business Review, only about 2% of organisations that cited AI in recent layoff announcements could point to actual implemented AI systems driving those decisions. The rest were laying off workers in anticipation of AI performance that had not yet materialised. Klarna was one of the rare cases where the AI was genuinely deployed at scale. The results are worth studying carefully.
What Klarna Actually Did in 2024
Klarna is a Swedish buy-now-pay-later company with roughly 150 million customers globally. Its customer service operation handles a high volume of relatively predictable queries: payment disputes, refund requests, account access issues, order status checks. These are exactly the kinds of tasks that AI systems trained on large volumes of historical data are supposed to handle well.
In partnership with OpenAI, Klarna deployed a customer service AI assistant in January 2024. The results in the first month were, by the company’s own metrics, striking. The chatbot resolved 2.3 million conversations, which Klarna said was the equivalent of the workload of 700 full-time agents. Average response time dropped from 11 minutes to under 2 minutes. Repeat customer contacts for the same issue fell by 25%. The company projected it would save $40 million annually.
Siemiatkowski told media the AI had proven it could do the work. Klarna did not replace those 700 workers overnight but, as reported by Fast Company, the company reduced headcount through attrition and did not backfill roles, allowing natural turnover to quietly do what a formal layoff announcement would have attracted more scrutiny for.

The Metrics That Were Not in the Press Release
Volume and speed are easy to measure. They also happen to be the metrics that look best in a press release. What Klarna did not publicise, and what emerged through 2024 and into 2025, was the data on customer satisfaction.
According to reporting by CX Dive and Digital Applied, Klarna’s customer satisfaction scores dropped 22% after the AI deployment. The chatbot handled the volume. It did not handle the complexity.
The categories of customer contact that broke the system were predictable in hindsight. Emotionally charged conversations, such as customers disputing a charge they believed was fraudulent or customers in financial difficulty trying to negotiate payment terms, do not follow the structured patterns that AI is trained on. Multi-step problems, where resolution requires understanding the history of several previous interactions in context, produced responses that were technically accurate but practically useless. Edge cases, which in any high-volume customer service operation represent a small percentage of contacts but a disproportionately high percentage of customer value, were handled badly.
The customers who had the worst experiences were, by definition, the customers who most needed to feel heard. Klarna was using AI most aggressively on exactly the interactions where the cost of failure was highest.
The CEO’s Reversal
By spring 2025, Klarna began piloting a hybrid model and started hiring human agents again. In a statement that received far less coverage than his original AI announcement, Siemiatkowski acknowledged that the aggressive reduction in human headcount had gone further than it should have. As reported by Entrepreneur and confirmed by CNBC, the CEO said the company had learned that AI and human agents are not interchangeable, and that treating them as such had damaged the customer experience.
The new model Klarna built is a tiered system: AI handles triage and routine high-volume queries; human agents handle escalations, complex cases, and high-value customer interactions; the system automatically routes to a human when the AI’s confidence drops below a threshold or emotional distress signals are detected in the customer’s language. This is a sensible architecture. It is also, notably, more expensive and more complex to manage than either full AI or full human operation.
The company is now hiring remote agents specifically, targeting workers with flexible schedules including students, parents, and workers in regions with lower labour costs. The irony is that the human workforce Klarna is rebuilding looks different from the one it replaced, and the terms are less stable. The AI experiment did not eliminate the need for human labour. It restructured who does it and on what terms.
What This Case Actually Tells Us
The Klarna case is significant because it is one of very few large-scale AI workforce replacement experiments with a documented outcome. Most coverage of AI and jobs consists of projections, announcements, and executive statements. Klarna actually deployed at scale and then reported what happened.
Three conclusions are supported by the evidence. First, AI can replace human workers for a specific, bounded category of tasks: high-volume, structured, low-complexity interactions where speed matters more than judgment. Within that category, the performance data is real. Two-minute response times versus eleven is a genuine improvement.
Second, the tasks that AI handles well are not the tasks that determine customer loyalty. Customers do not form lasting relationships with a company because their routine query was answered quickly. They form or end those relationships based on how they were treated when something went wrong. That is the category AI failed at consistently.
Third, the decision to replace 700 workers was made on metrics that measured the easy part of the job and ignored the part that mattered most. This is not a technology failure. It is a measurement failure. The technology performed as designed. The design did not account for the full scope of what the job required.
This connects directly to the broader pattern we have tracked: CFO surveys showing AI-driven job cut projections that consistently outpace what AI has actually delivered in practice, and tech industry layoffs justified by AI productivity claims that are often measured selectively and announced before the results are in.
What Workers and Companies Should Take From This
For workers in customer-facing roles, the Klarna case is neither reassuring nor catastrophic. The AI did replace 700 jobs. That is real. It also failed to replace the judgment, emotional intelligence, and contextual reasoning that made those jobs valuable. The parts of your role most worth developing are the parts that require you to be fully human: navigating ambiguity, reading emotional context, and taking accountability for outcomes. Those are the parts AI failed at, consistently, at scale.
For companies considering similar deployments, the Klarna data suggests a practical test: before removing human capacity, measure what happens to your most difficult 10% of customer interactions when they are handled only by AI. The routine 90% will likely be fine. The difficult 10% is where the 22% satisfaction drop lives.
Klarna will not be the last company to learn this through experience. It may be one of the first to have learned it publicly enough to be useful to everyone else. Follow our Jobs and AI coverage for ongoing analysis of how AI deployments are actually performing against the claims made when they launched.
This article contains analysis based on publicly reported information from CNBC, Entrepreneur, Fast Company, CX Dive, Harvard Business Review, and Digital Applied. Interpretations reflect the author’s analysis of documented outcomes and should not be treated as predictions about any specific company or industry.