Author: Peter Wahlgren, Viktor Ekberg

For decades, insurers have pushed for straight-through processing (STP) to cut cost and cycle time. Now, with AI agents that can parse damage photos, draft settlements, and auto-deny claims, we’re entering the STP endgame. The upside is obvious: speed, margin, and scale. But so are the new risks. Who’s responsible when a customer gets denied by a model? What’s the minimum human touch needed to ensure fairness, or even to defend a decision in court? In AI just broke your trust flow: humans are back into the loop [1], we argued that generative AI has broken the old trust flow, In claims, the handover between AI and human isn’t a technical issue; it’s a legal, ethical, and operational one [2], and it calls for a new playbook.

"If AI agents are going to make calls on behalf of the business, then we need to give them the same oversight we’d expect from any employee"

Viktor Ekberg, Algorithma

STP has been the north star of claims for over three decades [3]. It’s what every system upgrade, vendor pitch, and transformation program has promised: fewer manual steps, fewer handoffs, faster payouts, and lower loss adjustment expense. But as we approach full automation, something changes. The efficiency frontier meets the trust boundary.

The first wave of STP was about digitizing the obvious: optical character recognition for faxes, structured first notice of loss forms forms, claims workflows routed by rules engines. Then came the shift from rules to prediction; scoring models for fraud, severity, and recovery potential. Over the last decade, AI began taking on narrow, high-volume tasks: damage detection from photos, VIN decoding, invoice matching. These were still human-reviewed systems, but the path was clear. Automate the routine, escalate the edge cases.

Even as underwriting, billing, and servicing went digital, claims stayed stubbornly human. Not because they couldn’t be automated, but because they shouldn’t. Adjusters interpret incomplete narratives, assess credibility, weigh context, and negotiate outcomes. These aren’t just tasks, they’re judgment calls. That made claims the last human mile in an increasingly automated insurance stack. Until now.

What changes with GenAI isn’t just capability, it’s coverage. Image triage models don’t just flag damage[4]; they suggest repair scopes. LLM-powered bots can summarize adjuster notes, prefill decisions, and even simulate negotiation. In some cases, AI agents now propose, escalate, and settle claims without human input [5]. The last mile isn’t being augmented. It's being replaced. Welcome to straight-through AI processing: STAIP [6].

Why trust gaps widen as humans exit

STAIP delivers speed, but it also removes the friction points where humans naturally apply judgment, spot anomalies, and explain decisions. As agents take on end-to-end claim resolution, trust gaps widen, between customer and carrier, between model and regulator, between outcome and accountability. These aren’t technical bugs. They’re structural risks.

Automation assumes the input is valid. But in claims, that assumption is eroding. AI-generated accident photos, doctored invoices, and synthetic medical documents are flooding claims pipelines [7]. One UK carrier reported a 300% increase in detected image-based fraud in just one year [8]. In low-touch pipelines, there’s no human to flag that a tree shadow looks wrong, or that a license plate doesn’t match the VIN. Agents don’t just automate workflows, they amplify exposure. The more you automate, the more important it becomes to verify upstream.

AI can detect fraud, but it can also get defrauded. When a human adjuster denies a claim, they document their rationale, cite policy language, and can answer follow-up questions. When a model denies a claim, the customer may get a generic message, or worse, no explanation at all. In STAIP flows, that’s not a UX issue, it’s a governance gap. Regulators are starting to ask: Who made the decision, and how? And with upcoming rules like the EU AI Act [9] and ISO’s human oversight standard [10], that question won’t be optional. Opacity isn't just a design flaw, it’s a liability.

Claims decisions are shaped by historical data: past settlements, adjuster notes, payout averages. But that data reflects past bias; across geography, demographics, and documentation quality. When fed into opaque models, those patterns get institutionalized. Some carriers have already flagged disparities in denial rates by region or customer segment. Without controls, autonomous adjusting risks turning legacy bias into automated discrimination. And once it’s embedded in a model, it scales fast and invisibly.

The minimum-touch test

Not every claim needs a human. But some absolutely do. The question isn't whether to automate, it’s where human judgment still adds material trust, fairness, or legal defensibility. That’s what the minimum-touch test is designed to answer. It’s a simple, two-axis framework that helps claims leaders decide when zero-touch is justifiable, and when it becomes a governance failure waiting to happen.

In AI just broke your trust flow, [11] we showed how automation fails when it assumes input data is reliable by default. The minimum-touch test picks up from there, giving claims teams a way to decide, case by case, where that assumption still holds and where it doesn’t.

Two axes: claim complexity times evidence trustworthiness

Plot any claim along two dimensions:

Claim complexity: How much discretion or judgment is needed to resolve it?
Evidence trustworthiness: How verifiable is the input?

Claims that are low complexity and backed by high-trust evidence are natural candidates for automation. Everything else requires escalation, either before or after the AI agent acts.

Three lanes: 0-touch, secondary-review, primary-human

Once plotted, every claim falls into one of three operational lanes:

The goal isn’t to slow things down, it’s to apply human time where it matters most.

Even in zero-touch lanes, trust must be conditional, not assumed. High-speed automation demands high-trust input, and mechanisms to detect when that trust is being abused. The below table illustrates some examples:

This framework is a triage mechanism, one that carriers can plug into existing workflows, dashboard rules, or even claim-type configuration. And it scales. Use it to codify override triggers, design audit plans, and defend your decision architecture under scrutiny.

"The first wave of STP was about digitizing the obvious: optical character recognition for faxes, structured first notice of loss forms, claims workflows routed by rules engines. Then came the shift from rules to prediction; scoring models for fraud, severity, and recovery potential. Over the last decade, AI began taking on narrow, high-volume tasks: damage detection from photos, VIN decoding, invoice matching. These were still human-reviewed systems, but the path was clear. Automate the routine, escalate the edge cases."

Peter Wahlgren, Algorithma

Human-override design patterns

If we measure enterprise AI by the work it owns, not just the math it runs [12], then overrides aren’t edge-case handling, they’re how an agent defines its role. A reliable claims agent isn’t one that never asks for help. It’s one that knows when to pause, escalate, or hand the case to a human entirely. Override design is what turns AI from a tool into a team member, with judgment boundaries, accountability logic, and escalation authority.

Confidence thresholds are the first layer. A capable agent should own decisions when the facts are clear, the signal is strong, and the impact is low. But when confidence drops, the agent shouldn’t guess, it should escalate. Smart systems make this reflexive: above 95% confidence, go ahead. Between 70–95%, pause and flag. Below that, it’s not your call. This isn’t about limiting autonomy. It’s about anchoring it to conditions the business is willing to defend.

But escalation isn’t just statistical, it’s ethical and contextual. Some decisions carry more than financial risk. Agents must be aware of protected-class indicators, declared support needs, and outcome sensitivity. A human wouldn’t deny a bodily injury claim from a vulnerable customer based solely on policy text and scanned forms. An agent shouldn’t either. That moment of escalation is a signal that the work exceeds the agent’s span of responsibility [13] , and that’s a good thing. It shows the system knows its limits.

What separates autonomous from reckless isn’t confidence, it’s some akin to “self-awareness”. That’s why override behavior should be visible, auditable, and measurable. A real-time dashboard showing how often AI agents escalate, when humans step in, where payout patterns diverge by segment, this isn’t governance overhead. It’s how you manage a digital workforce.

As we argued in When the agent takes over, the right metric isn’t model accuracy. It’s the share of meaningful, high-trust work your agents can fully own. Override logic is how you scale that ownership safely. It tells each agent: “This part is yours. That part isn’t.” And it tells the business where to draw the line next.

From principles to practice: Managing a digital workforce

Autonomous claims agents don’t just process, they operate. They make calls, route edge cases, and represent the carrier’s judgment at scale. But like any colleague, they need structure: a job description, escalation rights, performance reviews, and a manager who knows when they’re slipping. This isn’t governance as a safety net. It’s management infrastructure for a workforce you no longer see.

Agents don’t need constant supervision, but they need a system that reminds them where their span of responsibility ends.

Getting started: It is all about building the team

You don’t scale this overnight. You scale it like you would a new team: one pilot, one set of expectations, one management system at a time.

Have a “sit down” with your AI agents. Where are they making confident calls? Where are they winging it? Run backtests using the minimum-touch test to find claims they should have escalated, and claims they didn’t need to.

Give one AI agent team, one product line, a new operating model. Override rules, live dashboards, audit rhythm. Treat the pilot not as a tech test, but as a role definition exercise. You’re not tuning models. You’re shaping behavior.

Promote what works. Build AI agent-level performance views, connect override actions to human handlers, and install drift telemetry that lets you manage your AI workforce like a real workforce, with clarity, accountability, and feedback loops.

This isn’t about controlling the machine. It’s about managing the colleague. If agents are now making claims decisions in your name, your job is no longer to approve every outcome. Your job is to define the boundaries of ownership, and hold the system accountable when it forgets them.

Claims without humans: From workflow automation to autonomous adjusters

ESG has a new org chart - and AI agents are on it

The new economics of scale: AI agents vs traditional headcount