CFOs have a new AI problem - they know the bill is coming, just not how much

Reading time 6 minutes

As enterprises race to deploy AI copilots and autonomous agents, finance leaders are finding the hardest part isn’t buying AI – it’s predicting what it will cost, proving its value, and stopping it from quietly running up a six-figure bill.

For years, the traditional enterprise software budgeting process was relatively straightforward. Buy a thousand licences, negotiate a discount, and forecast the cost for the next three years.

The explosive rise of AI – let alone the constant pressure from vendors that you “must” adopt AI or fall behind – is changing this. We’ve already seen AI costs rise, with many once-free models now becoming increasingly limited, and paid models changing their pricing plans.

According to recent research at the Wall Sreet Journal, finance leaders are increasingly struggling to track AI usage because many AI platforms are billed on consumption rather than fixed subscriptions. Instead of paying per user, organisations are paying for tokens, inference requests, agent actions, and model usage. And all of these can fluctuate dramatically from month to month. Only 26% of companies surveyed by KPMG said they had a comprehensive understanding of their AI costs.

You can appreciate the concern is growing among CFOs, let alone CEOs, CIOs, and all those charged with watching the balance sheet: are we actually getting a return on our AI investment?

One of the most interesting developments of the past year is that AI adoption is no longer the key metric. Nobody is impressed that employees are using ChatGPT, Copilot, Claude, Gemini or an internal AI chatbot.

The board wants to know tangible figures: how much time did AI save? How much revenue did it generate? How much cost did it eliminate? How much risk did it reduce? (Let’s not mention how much risk vibe-coding might well introduce.)

Unfortunately, many organisations still cannot answer those questions, nor even know how to begin doing so. Industry observers note AI initiatives are often measured by adoption rates rather than business outcomes, making it difficult to establish a clear return on investment.

This is creating what some analysts are calling an “AI productivity paradox“. Individual workers may feel more productive, but organisations struggle to demonstrate measurable financial gains. Meanwhile, employees spend significant time checking and correcting AI output, reducing some of the expected benefits.

In other words, AI might save an employee two hours a week while simultaneously adding thousands of dollars in monthly model costs.

This isn’t a new story. The cloud industry experienced something similar a decade ago. Companies enthusiastically migrated workloads to cloud platforms – because they believed they should, and again, because the market was telling them you must have a cloud strategy to remain competitive – before discovering that “pay as you go” often translated into “pay more than expected.”

Here we are again, and AI is following the same path.

A recent Reuters analysis described growing “AI sticker shock” as organisations discover that token-based billing can be surprisingly difficult to forecast. Coding agents, autonomous workflows and large-scale document processing can consume far more tokens than executives initially anticipated.

The problem becomes even more pronounced with agentic AI where the AI isn’t simply a response to a prompt, but can be set loose to do its own thing. And sure, these “own things” may be hugely beneficial. A good coding agent could work through a product backlog and solve bugs, add features, and do many wonderful things. But at what cost?

An AI agent might read dozens of documents, search multiple databases, call several APIs, generate multiple drafts, ask follow-up questions, and invoke other agents. Every one of those actions consumes tokens and compute resources.

An agent instructed to “research competitors and prepare a board report” may perform hundreds of model calls behind the scenes. The user sees a report. The finance department sees a bill.

One that frustrates me is when people use AI instead of simply thinking; a recent Reddit post commented about a recruiter who used an LLM to generate a two sentence response. I’ve personally seen team leaders use Claud to provide a lengthy document about particular features that could be enhanced in a report they wanted. I asked them which of the suggestions did they want? They said they didn’t know; they simply forwarded on what Claude proposed – without even thinking about whether adding those features actually brought value or were useful. Yet, they thought sending this list was somehow helpful, when it may well have resulted in wasted time and effort. But I digress.

The big question people need to ask themselves is can AI agents run up massive costs, and the short answer is yes. Poorly designed agents can generate surprisingly large expenses.

Common examples include recursive agent loops, excessively large context windows (which is why companies like Snowflake are now speaking about the importance of context), multiple agents talking to each other, unrestricted web searches, repeated document summarisation, and the use of premium models when smaller models would suffice.

A software developer experimenting with an autonomous coding agent may generate more AI consumption in a day than a traditional chatbot user consumes in a month.

This isn’t a theoretical problem; it’s genuinely happening. Uber executives recently discussed concerns about exhausting AI budgets far earlier than expected, while surveys show more than 80% of IT leaders have experienced unexpected AI cost increases.

One CIO recently observed that organisations are now facing not only cloud sprawl but also “agent sprawl” – an uncontrolled proliferation of AI agents across the enterprise.

These are not excuses to cut headcount, by the way. In fact, companies who justify layoffs on the basis of “AI” are being lazy and dishonest. However, that’s a different story. Right now, the important matter is to control AI spend. It’s time for “FinOps for AI” – just as cloud cost-management platforms emerged, a new generation of AI observability and governance tools is appearing.

Examples include:

Best-practice organisations are increasingly implementing per-user budgets, per-agent budgets, monthly token quotas, approval workflows for expensive models, automatic fallback from premium to cheaper models, real-time spend dashboards, and even chargeback to business units. In effect, AI is becoming another utility service that must be governed.

Perhaps the strangest AI trend of 2026 is “token-maxxing” – encouraging employees to consume as many AI tokens as possible. The logic is simple: more AI usage equals more productivity. Unfortunately, reality appears more complicated.

Several executives have begun pushing back against the idea that token consumption is a useful KPI. Replit’s Head of AI recently described internal token leaderboards as “very dystopian“, arguing that high token consumption doesn’t necessarily correlate with meaningful business outcomes.

And, they’re right. Imagine measuring employee productivity by number of emails sent, number of AWS instances launched, or heaven forbid, number of meetings attended (please don’t!).

Those metrics tell you activity occurred, but they tell you nothing about whether anything valuable happened. Token consumption is similar. The best AI program is not the one that consumes the most tokens; it’s the one that produces the greatest business value per token. A well-designed agent that solves a problem for $5 is far more valuable than one that spends $500 producing a prettier answer.

The lesson for CIOs, CTOs and CFOs is, despite what salespeople will tell you, the future of enterprise AI won’t be determined by who uses the most AI. It will be determined by who can measure value most effectively. The winners will be organisations that can answer questions such as cost per customer service case resolved, cost per software feature delivered, cost per qualified sales lead, cost per document processed, and so on. It’s not about how many tokens you consumed, but rather how many dollars of business value did each token create?

Until organisations can answer that question, AI may remain the fastest-growing line item in the budget – and the hardest one to justify.

https://itwire.com/business-it-news/enterprise-solutions/cfos-have-a-new-ai-problem-they-know-the-bill-is-coming-just-not-how-much