In February a board approved a customer-facing chatbot. Three vendors, 14 months to first live conversation, a budget north of £1.4m, a steering committee that meets fortnightly, and a roadmap whose first six months are integrations the team does not yet have. Everyone in the room nodded. Everyone in the room knew what the slide was supposed to look like.
In the same building, the ops director for finance shipped an AP reconciliation agent in seven weeks on a £42,000 budget. No steering committee. The agent now handles 68% of incoming invoices end to end. It saves 4.2 hours per AP clerk per week across 12 clerks, which annualises to roughly £180,000 in recovered capacity, before counting the late-payment fees the team stopped paying. Nobody asked the ops director to present at the board.
That gap, between the project that gets the budget and the project that returns the money, is the most consistent pattern in this work. The agents that pay back fastest are the ones nobody wants to demo. They are not customer-facing. They do not have a personality. They read a document or a record, classify it, and route it. They are boring on a slide and devastating on a P&L.
What follows is three workflow shapes from three different industries. Finance, insurance, real estate. Different surface, same underlying pattern. If a team can recognise the pattern, they can find their own version of it inside their own back office in an afternoon.
The three properties that make a workflow agent-tractable
Before the examples, the shape.
The first property is structured input. The work starts from a document, a form, an email that follows a template, or a system record. It does not start from an open-ended conversation. The agent is reading something with a shape, not interpreting something with a mood.
The second is repeatable shape. The work follows the same broad steps every time. Variants are bounded. There are four or five flavours of the case, not five hundred. The agent does not need to invent the procedure on the fly.
The third is low individual judgement per case. Each case has a clear right answer, or a clear escalation path when the answer is not clear. It is not a question of taste, brand voice, or relationship. The agent is not deciding what tone to take with a key account. It is deciding whether a number sits inside or outside a tolerance band.
Together these three properties make a workflow both agent-tractable and audit-tractable. The first matters for whether the thing works. The second matters for whether anyone trusts it once it does.
Most internal pitches fail at least one of the three. The customer-facing chatbot fails structured input. The analyst co-pilot fails repeatable shape. The brand-voice writing assistant fails low individual judgement. None of those projects are unbuildable. They are bad first projects, and they will land better after a team has shipped one of the boring ones and learned what production behaviour looks like.
Workflow 1: AP three-way match
An accounts payable clerk spends their shift matching incoming invoices against purchase orders and goods-receipt notes, then flagging discrepancies. The work moves across three to five systems: the ERP, the procurement platform, a warehouse management system, the invoice intake mailbox, and a spreadsheet of negotiated price overrides nobody wants to talk about.
Take a UK retailer's finance team processing 14,000 invoices a month. Pre-agent, the team of 12 clerks worked through an exception rate of roughly 38%, with each exception taking 9 to 14 minutes of cross-system stitching. Net touch time across the team was about 1,600 hours a month.
The agent reads each invoice, queries the PO and goods-receipt systems, and classifies the match. The categories are bounded: exact match, quantity variance, price variance, missing PO, duplicate. Routine matches inside tolerance auto-post. Variances above tolerance route to a clerk with the three records placed side by side, the deltas highlighted, and a suggested resolution.
Post-agent, the same team runs at a 14% exception rate. Touch time dropped to roughly 470 hours a month. Fully loaded clerk cost of £28 an hour puts the annualised saving around £380,000. The team did not lose a single headcount. They redirected the capacity to vendor onboarding, payment-term renegotiation, and the rolling close, which is the work they always wanted to do and never had headroom for.
The unglamorous part of this workflow is what makes it valuable. The same shape recurs in healthcare provider organisations, manufacturing, hospitality, construction. Same input, same classification, same routing, same payback. An agent built for the retailer transfers to a hospital with a rules tweak. The cross-system reconciliation pattern is the most under-priced agent opportunity in most back offices today.
The agent that ships is the one with the right first workflow. The first workflow is almost always the boring one.
Workflow 2: Claims FNOL intake
A policyholder reports a loss. They phone the call centre, fill in the web form, or open the carrier's app. An intake adjuster takes the report, types the structured fields into the claims system, classifies severity, and starts the triage. Most of the work is structured-data assembly. The judgement is a small fraction of it.
A national insurance carrier running motor and home claims fielded around 2,200 first-notice-of-loss reports a week. Average handle time on intake was 18 minutes. Of those 18 minutes, internal time-and-motion work showed 12 minutes was field population, 4 minutes was severity classification against a written rubric, and 2 minutes was the actual conversation the customer remembered.
The agent runs as a co-pilot on phone calls and as the front end on digital channels. It populates fields while the adjuster talks, classifies severity using the same rubric the adjuster was already using, escalates anything that crosses the severity threshold or hits a fraud indicator, and hands the adjuster a complete file rather than an empty form.
Average handle time fell from 18 minutes to 7. At 2,200 claims a week, that is roughly 400 adjuster hours a week recovered, or 20,800 hours a year. The carrier did not reduce headcount. They closed the backlog that had been sitting at 11 days and brought it inside 48 hours, which moved the customer NPS on intake by 14 points in a quarter. The customer never knew the agent existed. The part of the interaction the customer cared about was the one the human still handled.
What makes this workflow especially suited to a first build is the audit story. Every classification the agent makes is logged with its inputs and the rubric rule it triggered. A compliance lead can reconstruct any decision in seconds. That matters in regulated lines of business in a way it does not for a marketing chatbot, and it is one reason risk functions sign off on these projects faster than they sign off on the demo-friendly ones. The insurance use cases have a dozen variants of this same shape.
Workflow 3: Lease abstraction
A commercial lease runs 40 to 120 pages. Inside it sit roughly 30 structured fields that asset managers, brokers, and acquisitions teams need: commencement date, term, base rent, escalation schedule, service-charge treatment, renewal options, termination rights, default cures, security deposit, exclusive-use clauses, and so on.
An associate or analyst pulls those fields by reading the lease end to end and typing them into an abstract template. The work takes 95 to 120 minutes per lease. Fully loaded cost of that associate is £60 to £110 an hour.
An agent reads the same lease in 4 to 7 minutes at an API cost of a few pence. The output drops into the abstract template. The analyst spot-checks the high-stakes fields (rent escalation, break clauses, indemnities) rather than transcribing them from scratch.
A mid-sized commercial team handling 200 leases a year recovers 320 to 380 hours, which sits between £19,000 and £42,000 in raw associate time. A portfolio team running 1,200 leases a year clears £120,000 to £250,000. The bigger gain is the one that never shows up in the spreadsheet. The associate gets their time back for the renewal-strategy work that requires their training. The boring task is the one the agent absorbs. The intellectually demanding task is the one the human keeps. More on the variants in our notes on the real estate use cases.
What the three workflows have in common
Lay the three side by side and the pattern is hard to miss. None of them would survive a beauty contest against a customer-facing chatbot in a steering committee. All of them paid back inside seven weeks. Each one shares the same five features:
- high volume
- structured input
- repeatable shape
- low individual judgement per case
- a clear human override path
When a team is hunting for their first agent, those five features are the screen. If a candidate workflow has all five, it is a strong first build. If it is missing one, the project gets harder. If it is missing two, it is not first-build material.
The buyers who internalise this, who can resist the urge to chase the demo-friendly project first, ship more agents in their first year than the buyers who cannot. Not because they are better engineers. Because they picked targets that could be hit. Each shipped agent funds the next one and builds the operational vocabulary the organisation needs for the harder builds later.
The exception, after three boring wins
There is a sequencing argument inside this. Once a team has shipped two or three of the unglamorous workflows, something changes. They now know how their agents handle exceptions in production. They have a monitoring story. They know how to write a rubric. They have a working pattern for human-in-the-loop. They have stakeholders who have seen an agent behave well and an agent behave badly and can tell the difference.
That team has earned the right to take on one harder project. Usually it is the customer-facing one: the chatbot, the co-pilot, the explainer dashboard. They will still find it hard. They will not find it impossible.
The team that tries the customer-facing project first, before they have shipped anything unglamorous, ships either nothing or something they quietly take down inside 12 months. The agents that get pulled are almost always the ones built before the team learned what they were doing.
Sequencing is the unfashionable lever in this work. It does not feature in vendor pitches. It does not show up in capability maps. It is the single biggest predictor of whether a programme ships three agents in its first year or zero.
The one-question test
For any workflow a team is considering for their first build, ask one question. If the input to this work disappeared tomorrow, would the worker have nothing to do?
If the answer is yes, the workflow has structured input. The AP clerk without invoices has nothing to match. The intake adjuster without an FNOL form has nothing to type. The real-estate analyst without a lease PDF has nothing to abstract. These are the workflows that pay back in weeks.
If the answer is no, the worker improvises, the work has too much open-ended judgement for a first build. The strategist without a brief still has things to say. The brand writer without a prompt still has a voice to apply. The relationship manager without a meeting still has a customer to call. These are not bad projects. They are bad first projects.
Every organisation that has shipped agents at scale has the same arc behind them. The first wave of wins is unglamorous. Documents in, structured fields out. Records reconciled. Forms triaged. None of it makes the keynote. All of it makes the budget for what comes next.
When the budget review comes round at the end of year one, the team with three boring agents in production has numbers. Hours saved, errors caught, cycle time reduced. The team that swung for the customer-facing project has a roadmap and an apology. Those two teams get treated differently by their finance committees, and the difference compounds in year two.
The agent that ships is the one with the right first workflow. The first workflow is almost always the boring one. The teams that recognise this and act on it spend their first year building capability. The teams that do not spend their first year building slide decks.