The Problem with "Measuring AI Impact"
When a board asks what results the AI investment is producing, the typical answer is one of two things: a fabricated number or an uncomfortable silence.
The first case occurs when the team reports activity metrics — how many agents were built, how many tasks were automated — without connecting them to business outcomes. The second occurs when no one established a baseline before starting.
Both problems share the same root cause: no one defined what to measure before deployment.
This article proposes a measurement framework for the first 90 days of an AI agent. It is designed so that a CFO can present it to the board without relying on technical jargon.
Before You Measure: The Baseline
No metric has value without a point of comparison. Before activating any agent, you must document the current state of the process being automated.
The three questions to answer before day 1:
- How many hours does the team spend on this task per week?
- What is the current error rate — rework, corrections, escalations?
- What does that process cost today, including people's time at an hourly rate?
Without this information, the first 90 days produce data without context. With it, they produce a financial argument.
The Three Metric Categories That Matter
1. Time Recovered
This is the most immediate metric and the easiest to communicate. It measures how many hours of human work the automated process no longer consumes.
How to calculate it: weekly hours spent on the process before the agent, multiplied by the number of weeks in the measurement period, minus the supervision time the agent still requires.
A concrete example: an industrial manufacturing company with 180 employees was spending 14 hours per week consolidating production data from three separate sources to generate its weekly operations report. After implementing a consolidation and reporting agent, that process required only 2 hours of review. Over 12 weeks, the team recovered approximately 144 hours. At an average cost of 25 €/hour for the profiles involved, that represents 3,600 € in reassigned time — not counting the value of having the report available Monday at 8:00 a.m. instead of Wednesday at noon.
2. Error and Rework Reduction
This category is harder to measure but more powerful before a board, because it connects directly to operational risk.
What to measure: the number of errors detected in the process before the agent versus after. This includes manual corrections, escalations, duplicate or inconsistent records, and any rework that consumes additional time.
In processes with manual integration between ERP systems and spreadsheets — a common pattern in mid-size companies — error rates of 8 to 15% in data entry records are typical. An agent that validates and normalizes data at the point of capture can reduce that rate to 1–3% within the first few weeks. The impact is not only operational: it reduces the risk of decisions being made on incorrect data.
3. Cost per Transaction
This is the metric that matters most to a CFO with a long-term view. It measures how much it costs to process one unit of work — an invoice, an order, a report, a query — before and after the agent.
How to calculate it: total process cost (people + tools + management time) divided by the volume of transactions processed in the period.
If the cost per processed invoice before implementing the agent was 4.20 € and afterward is 1.80 €, that differential multiplied by annual volume is the financial argument. For a company that processes 2,000 invoices per month, the difference is 57,600 € per year — a number that requires no translation.
What Not to Measure in the First 90 Days
Some metrics appear relevant but generate more confusion than clarity during this period:
- Internal user satisfaction: too subjective and variable in the first weeks of adoption.
- Number of agents deployed: an activity metric, not an outcome metric.
- Agent response speed: relevant in stable production, not during the pilot phase.
- Total ROI of the AI program: premature at 90 days if additional initiatives are underway.
The focus in the first 90 days must remain narrow: one process, clear metrics, comparison against the baseline.
How to Structure the Board Report
A 90-day report that a CFO can defend does not need more than one page. The recommended structure:
- Automated process: a one-line description of what the agent does.
- Baseline: state of the process before deployment (hours, errors, cost).
- 90-day result: the three metrics with actual data.
- Cost of the agent during the period: infrastructure, licenses, configuration time.
- Balance: difference between cost and value recovered.
- Next step: which process is automated next and why.
This format answers the question a board always asks: was it worth it? And if the answer is yes, it opens the conversation about what comes next.
Conclusion
Measuring the first 90 days of an AI agent well does not require a data team or a complex dashboard. It requires discipline before deployment — establishing the baseline — and consistency throughout the pilot period.
Companies that do this well have a concrete advantage: they can scale with evidence, not with faith. Each new agent is justified by the data from the previous one.
If you are in the process of implementing an agent, or evaluating whether to do so, and want to define the measurement framework before you begin, you can request a free diagnostic. The form is below. No call needs to be scheduled immediately.