The real problem is not technical: it is visibility
When a company deploys an AI agent, the initial conversation tends to revolve around what the agent can do. Rarely does anyone discuss how to know whether it is doing it well.
The typical result: the technical team has access to logs, traces, and infrastructure metrics. The CFO or COO receives, at best, a monthly report with screenshots. At worst, they receive nothing and assume it is "working" because no one is complaining.
That is not governance. That is faith.
An AI agent is a business process. And like any business process, it must be measurable with indicators that leadership can read, interpret, and use to make decisions—without requiring technical intermediaries.
This article describes the four metrics that make exactly that possible.
Metric 1: Autonomous resolution rate
This is the most direct metric. It measures what percentage of tasks assigned to the agent are completed without human intervention.
If the agent handles expense approval requests, for example, the autonomous resolution rate indicates how many of those requests it processes on its own, versus how many require someone on the team to step in.
A well-calibrated agent should autonomously resolve between 70% and 90% of the cases it was designed for. If that number drops below 60%, there is a problem: either the agent is not well trained for the real volume of cases, or the scope defined at implementation does not match what the business is actually asking of it.
This metric requires no technical access. It is a number. It can be placed in a management dashboard alongside the rest of the operational KPIs.
Metric 2: Escalation rate
Complementary to the previous one. It measures how often the agent transfers a task to a human because it cannot—or should not—resolve it on its own.
Escalation is not necessarily a problem. A well-designed agent escalates when appropriate: cases outside its scope, situations that require judgment, exceptions that were not anticipated. The problem arises when the escalation rate is systematically high or when it rises without apparent reason.
A sustained increase in the escalation rate is an early warning signal. It may indicate that the volume of atypical cases is growing, that the agent needs adjustment, or that there has been a change in the business process that was not reflected in the agent's configuration.
For the CFO, this metric functions as an operational risk indicator. If the escalation rate rises, there is something to review before it becomes a larger problem.
Metric 3: Cycle time per task
This measures how long the agent takes to complete a task from the moment it receives it to the moment it delivers the result.
This metric has two uses. The first is comparative: how long did that same process take before the agent? If the invoice reconciliation process took 4 hours with manual intervention and now takes 12 minutes, that is a concrete business data point, not a marketing promise.
The second use is continuous monitoring. If cycle time starts to grow without any change in work volume, it may indicate a performance issue, a failing external dependency, or a backlog of unresolved cases.
Cycle time is a metric that any operations director understands immediately. It requires no technical translation.
Metric 4: Cost per task
This is the metric that interests a CFO most and, paradoxically, the one least often instrumented in agent deployments.
Every time an agent executes a task, it consumes resources: language model calls, compute time, integrations with external systems. That consumption has a cost. If that cost is not measured per task, it is impossible to calculate the agent's real ROI.
The calculation is not complex. If the agent processes 800 requests per month and the total monthly infrastructure cost of the agent is 400 euros, the cost per task is EUR 0.50. If the same process cost EUR 8 per task with human intervention, the saving is EUR 7.50 per task, or EUR 6,000 per month on that specific process.
Those ranges vary depending on the type of process, volume, and complexity. But the calculation mechanism is the same. And it is a calculation the CFO can perform independently, without needing the technical team to translate it.
A concrete example: purchasing department at an industrial company
An industrial distribution company with 80 employees deployed an agent to manage the purchase order validation process: verifying approval limits, cross-referencing available budget, and routing the request to the appropriate approver.
Before the agent, that process took between 6 and 24 hours depending on team availability. With the agent, average cycle time dropped to 18 minutes for 78% of cases. The remaining 22% are escalated to a responsible party because they involve exceptions or amounts outside policy.
The monthly cost of the agent, including infrastructure and governance, came in at a range of EUR 300 to EUR 500. The purchasing team recovered between 25 and 35 monthly hours previously spent on routine validations.
Those four metrics—resolution rate, escalation rate, cycle time, and cost per task—were available in a dashboard the COO reviewed every week. Without needing to speak with the technical team.
What to do if your company does not have these metrics today
If your company already has agents in production and cannot answer these four questions with concrete data, you have a governance problem, not a technology problem.
If you are evaluating deploying agents and no one has discussed how you will measure their performance, that is a signal that the proposal on the table is incomplete.
Metrics are not an add-on. They are part of the agent's design. An agent without business instrumentation is a process without controls.
If you want to review which metrics make sense to instrument in your case—or assess whether the agents you already have are generating the expected return—request a free diagnostic. It is a short form, with no immediate call required, and we will return a concrete assessment in under 48 hours.