Opus 4.7 regressed. The story is what it says about compute.

By Oscar Espinoza, Alvento — 22 April 2026 — 5 min read

An isometric editorial illustration of a silicon chip and its surrounding circuit board split down the centre by a jagged fissure glowing acid-lime green, rendered in charcoal stipple on matte black.

Anthropic shipped Claude Opus 4.7 on 16 April. Within a day, side-by-side regressions were circulating. On the MRCR long-context benchmark, 4.6 scored 78.3%. 4.7 scored 32.2%. The new tokenizer produces roughly 35% more tokens on the same input, so bills go up even though the sticker price didn't. Reliability complaints started the same day.

The bigger story is what happened four days later. Anthropic announced up to $25 billion in new investment from Amazon, a $100 billion cloud commitment to AWS, and access to 5 gigawatts of Trainium capacity. Their own announcement uses the phrase "inevitable strain" about the infrastructure pressure from enterprise and consumer demand.

At the same time, Anthropic is running a withheld frontier model called Mythos. Mythos scored 93.9% on SWE-bench and 97.6% on USAMO. It's available to around 40 organisations through Project Glasswing, Anthropic's cybersecurity initiative. Every public signal points to that being where the best compute is going, and to Opus 4.7 shipping on what was left.

Three things to take from this if you run Claude in production

Version numbers aren't a clean proxy for progress any more

A higher version used to mean better across the board. It doesn't. Model releases now get shaped by infrastructure, research priorities and commercial deals you can't see from the outside. The same lab can ship a frontier system to 40 security partners and a degraded public version to everyone else in the same month.

You're a procurement function whether you want to be or not

Run 4.6 and 4.7 on your actual prompts before you switch anything. Pin your model version in production so you're not auto-upgraded into a regression. Build an evaluation harness now. If you haven't got automated regression testing for model behaviour, this is the quarter to sort it.

Frontier lab dynamics belong in your risk model

Compute allocation, partnership deals, and research priorities at Anthropic, OpenAI and Google flow straight into the tools your business runs on. Those dynamics are not going to become more stable. Treat them the way you'd treat any other critical vendor dependency: with contracts, fallbacks, and a clear view of the blast radius if the supplier changes direction.

The bigger point

AI compute isn't abundant any more. It's contested, expensive, and allocated strategically. Businesses that assume new releases are free upgrades will get burned. Businesses that treat AI the way they treat any other critical vendor won't.

If you're running Claude in production, it's worth a couple of hours of your time this week.

Deploying AI in a regulated environment? Alvento helps businesses deploy and govern AI systems in production — evaluation harnesses, version discipline, and vendor risk modelling. Book a diagnostic at alvento.ltd or email hello@alvento.ltd — first conversation is free.

Sources