As enterprise AI accelerates, the most expensive question is no longer whether the tool works. It is whether the tool can prove its work, on demand, to someone with the authority to stop it.
The real bill emerges in the time between when a procurement decision is signed and when the operations team has to deliver in front of an auditor, regulator, or customer. The pattern may seem familiar to many of you who have sat through a post-deployment operational debrief.
I call it execution debt.
What execution debt is, and is not
Execution debt is the gap between what an AI contract promises and what the operations team must deliver to make the system work under real conditions.
It is not technical debt. Technical debt normally resides inside the codebase and engineers refactor it as required. Execution debt sits between the system and the business. It can only be repaid by people, and usually not the contract signers. That is why it rarely appears in the business case.
Signing off on procurement triggers the debt, which then silently accrues. Every layer of context left outside procurement becomes a charge the operations team carries on its own ledger. Language. Regulatory provenance. Integration with existing process. Audit defensibility. Training. Change management on the gemba front line. This is where the debt either earns its repayment or defaults.
The first symptom is rarely a system failure. It surfaces when the system comes up blank to a stakeholder question in front of someone who can act on the silence. By the time the debt becomes visible, it has already compounded for months. The contract is signed once. The deployment is negotiated every day afterwards.
Case one: the audit chain
On 2 April 2026, the FDA issued Warning Letter 320-26-58 to Purolea Cosmetics Lab in Livonia, Michigan. It is the first FDA cGMP enforcement action explicitly citing AI over-reliance as a manufacturing compliance violation.
Purolea had used AI agents to generate drug specifications, manufacturing procedures and master production records. The Quality Unit had not reviewed them. Process validation had not been conducted. The citations sit under 21 CFR 211.22(c) and 21 CFR 211.100.
When inspectors flagged the absence of process validation, the firm responded that the AI agent had not identified it as a requirement. A regulated pharmaceutical manufacturer told the FDA its compliance failure was the AI agent's fault. The agency had never before had to underline that sentence. It has now.
The pattern is not exclusive to pharma. Wherever AI-generated artefacts ship to production without a competent human in the loop to review, the same accountability gap forms. Systems, used properly, can assist or accelerate the decision. They cannot own the underlying obligation.
The frameworks exist. The FDA's January 2025 draft guidance set out a seven-step credibility framework keyed to context-of-use and model risk. ISPE's July 2025 GAMP Guide on AI runs to 290 pages. Article 12 of the EU AI Act mandates automatic logging across the system lifecycle, an audit trail by statute rather than by goodwill. The frameworks exist. The evidence trail does not write itself.
I remember GMP auditing with Japanese experts on site, some of them with 30 years at the coal face. Across the board, in every process, AI or otherwise, a binary perspective prevailed. Was there any risk, even theoretical, of a deviation? If so, it had to be rooted out and eliminated as far as possible. In a Japanese audit room, risk and uncertainty are not abstractions. They are obligations to discharge.
Provenance was always the compliance frontier. The concept predates AI. AI just brought that frontier into focus for people who hadn't had to look at it before.
Case two: the language layer
In February 2026, NTT DATA and NVIDIA published a joint study on Hugging Face testing NTT's tsuzumi-v2.0-28B-instruct model against a controlled set of Japanese legal documents. Baseline accuracy was 15.3 per cent. Using NVIDIA's Nemotron-Personas-Japan dataset to expand 450 seed samples into 138,000 synthetic training examples, accuracy rose to 79.3 per cent.
NTT DATA is the largest IT services company headquartered in Japan, with offices worldwide. If a Japanese technology leader has to engineer its way from 15.3 per cent to 79.3 per cent on legal-language tasks, the procurement question is not whether a tool supports Japanese. It is whether it understands your Japanese.
Which brings to mind a recent Databricks slide deck I translated for the Japanese subsidiary of a German client. Deciding which terms stayed in Roman letters and which went to katakana phonetics was like translating a French menu for a high-class Tokyo hotel. Which terms are already on the end client's radar? Which should be phoneticised? Which should be rendered into Japanese? Which require additional explanation? Multiply that across 64 slides and a six-week ringi cycle and you have the translation tax in plain sight. The vendor sold seamless multilingual support. The work landed on the senior engineers who stopped architecting systems and started maintaining a glossary.
Japan's workforce woes are no secret but the IT engineering shortfall is particularly acute. The shortage is already projected to approach 800,000 by 2030 in METI's high-demand scenario. Every manual override becomes a luxury no enterprise can afford.
In enterprise adoption, language is infrastructure. It carries authority, risk and accountability. AI can economise on the production of words. Producing 信用 (shin'yō, trust) is another matter altogether.
The deeper pattern
Step back from the cases and the structural pattern is clear. AI is bought at the centre and absorbed at the periphery, where the exceptions live. The exceptions are where budget, accountability and trust are consumed.
The mechanism is the operating model, not the postcode.
Two figures show why the debt rises even as the model price falls.
Stanford's HAI AI Index 2025 documented a 280-fold drop in the inference cost of a GPT-3.5-equivalent model between November 2022 and October 2024. Twenty dollars per million tokens fell to seven cents. Hardware costs declined by 30% annually. Energy efficiency improved by 40% each year.
Total cost of ownership is moving the other way. The IBM Cost of a Data Breach 2025 Report places the global average at $4.44 million, down 9% globally, but up to an all-time high of $10.22 million in the US. The figure that matters more for this conversation: 97% of organisations breached through AI systems lacked proper AI access controls. Shadow AI was a factor in 20% of breaches and added an average of $670,000 to incident cost.
The same report found that organisations using AI extensively in defence saved an average of $1.9 million per breach versus those that did not. The cheap part is the model. The expensive part is the deployment context the centre never priced in.
The EU AI Act underlines this directly. Articles 9, 10 and 12 require continuous risk management, data governance and lifecycle logging for high-risk AI systems. The high-risk obligations apply from 2 August 2026, subject to a Digital Omnibus simplification proposal that may shift the deadline. For procurement, that means evidence obligations must be priced before deployment, not discovered during remediation.
Gartner's July 2024 forecast that 30% of generative AI projects would be abandoned after proof of concept by end-2025 has since been overtaken by the operational evidence. MIT's GenAI Divide research, published in mid-2025, found that 95% of enterprise generative AI pilots were delivering no measurable return. The precise percentage matters less than the pattern. Pilots are easier to launch than to operationalise. Buying AI is now easier than absorbing it.
What this asks of procurement
The strongest enterprise AI buyers in 2026 will not be the fastest buyers. They will be the ones that price the periphery into the procurement decision before signature. The control point is not the steering committee after go-live. It is the statement of work before signature.
They will demand to know how the system shows its reasoning chain, in whose language, and under whose oversight. They will demand at least one person in the room who speaks both the vendor's English and the customer's operating culture. That person is not a bolt-on translator. They are the pattern-spotter who sees where the centre's assumptions fracture before the contract is signed.
Before signature, ask the vendor to produce the audit trail their system generated for a comparable customer in the past six months. If the answer is silence, the price on the contract is not the price.
This is where Smartsourcing earns its place.
Smartsourcing isn't a pricing model. It's a delivery discipline that puts native context inside the project from day one. Mixed-nationality teams. Native-language capability across the markets where the work actually lands. At least one person on the build team who speaks the customer's language and shares their nature, reading the room before the requirement is documented.
Dirox has been refining this since 2003. AI hasn't changed the principle. It has raised the price of getting it wrong. When the project review catches the risk before it spreads, especially in those first three crucial months, that isn't a feature. It is the absence of the silence that would otherwise have followed the auditor's question.
Closing
AI cuts the cost of the answer. The cost of being trusted with the question is rising. That remains the moat.
By Richard Mort, Strategic Intelligence Consultant


.png)
.png)





