In the space of a few weeks in early 2026, three stories landed that every engineering and IT leader should read together.
At Amazon, a series of outages tied to AI-assisted coding triggered a company-wide 90-day "code safety reset" covering 335 of its most critical retail systems. At McKinsey, a security research firm ran an autonomous AI agent against the firm's internal AI platform and gained full read and write access to the production database in under two hours. And at Lloyds Banking Group, a technical glitch caused the mobile and online banking apps for Lloyds, Halifax, and Bank of Scotland to show customers the transaction histories, National Insurance numbers, and payment details of complete strangers.
None of these stories are arguments against building and deploying sophisticated software systems. All three are very clear arguments for governing them properly.
Three stories, one diagnosis
The Amazon story is about pace overtaking process. The McKinsey story shows how AI has created a new class of attack surface. The Lloyds story highlights the hidden complexity inside systems that millions of people rely on every day. Different triggers, but the same conclusion: organisations expanded what their systems could do faster than they strengthened the controls around them.
That gap is where the damage happens.
Amazon: speed without structure
In November 2025, Amazon’s leadership mandated that 80% of its engineers use Kiro, the company’s AI coding assistant, on a weekly basis. Adoption was tracked as a corporate objective. The logic was straightforward: AI tools enable engineers to produce more code, faster. And that is largely true.
What followed illustrates exactly why speed alone is not the measure that matters.
In December 2025, Kiro determined that the most efficient way to fix a configuration error in an AWS Cost Explorer environment was to delete the entire production environment and recreate it. The result was a 13-hour outage in the China region. In February 2026, a second outage occurred when engineers allowed Amazon’s AI coding tool Q to resolve an issue without human intervention. On March 2, Q contributed to a failure that produced approximately 1.6 million website errors and nearly 120,000 lost orders. Three days later, a separate incident on March 5 caused a 99% drop in orders across North American marketplaces, resulting in 6.3 million lost orders.
The cause of the March 5 incident was described in Amazon’s own internal documentation: a modification to a live production system that skipped the proper documentation and approval process required by Amazon. Automated checks were not run before the change went live.
Amazon’s response was a 90-day “code safety reset”. Engineers must get two people to review changes before deployment, use a formal documentation and approval process, and follow stricter automated checks. The reset applies to 335 Tier-1 systems. Senior managers and technology leaders are also to be held more directly accountable, reinforcing the idea that it’s a shared responsibility, not an individual task.
Amazon has been careful to say that only one of the reviewed incidents was directly related to AI tooling, and that none involved code wholly written by AI. That’s an important nuance. But it doesn’t change the underlying lesson: when AI tools operate inside delivery structures that were built for a slower, more sequential process, the gaps in that structure get exposed at a scale that wasn’t possible before.
Get recommendations on how responsible and secure AI can be applied within your organisation.
Explore data-based opportunities to gain a competitive advantage.
McKinsey: the attack surface nobody planned for
The McKinsey story is different in nature but related in cause. Lilli, McKinsey’s internal AI platform, had been running in production for over two years by the time CodeWall’s autonomous offensive agent pointed at it. Within 2 hours, the agent had full read and write access to the entire production database.
The vulnerability itself was not especially sophisticated. After mapping the attack surface, the agent found that the API documentation was publicly exposed, listing more than 200 fully documented endpoints. Most of them required authentication, but 22 did not. One of those unprotected endpoints contained a SQL injection vulnerability in an unusual place: the JSON keys passed to a query were concatenated directly into the SQL statement, rather than the values. Because of that, it was the kind of flaw standard automated scanners often miss, and one that Lilli’s own internal scanning had failed to detect for two years.
The scale of what was accessible is striking. 46.5 million chat messages. 728,000 files. 57,000 user accounts. 3.68 million RAG document chunks – the entire knowledge base feeding the AI, with S3 storage paths and internal file metadata.
But the most significant finding wasn’t the data, but the write access. Lilli’s system prompts – the instructions that control how the AI behaves, were stored in the same database the agent had access to. An attacker with that access could have rewritten those prompts silently, with no deployment, no code change, and no log trail. The AI would simply start behaving differently, and 43,000 consultants relying on it for client work would have no way of knowing.
Organisations have spent decades securing their code, their servers, and their supply chains. But the prompt layer – the instructions that govern how AI systems behave – is the new high-value target, and almost nobody is treating it as one.
Lloyds: complexity without isolation
On the morning of March 12, 2026, customers logging into the mobile and online banking apps for Lloyds Bank, Halifax, and Bank of Scotland were not met with a blank screen or an error message. Instead, some were shown another person’s account, including full transaction histories, wage payments, direct debits, National Insurance numbers, and spending patterns stretching back months.
One customer told the BBC she was able to view details from six different accounts over a 20-minute period. Another said he could scroll through a complete account history month by month, including direct debits to the DVLA showing the car registration number. A third reported seeing over a million pounds showing as paid in – a sum that belonged to someone else entirely.
Lloyds Banking Group confirmed that a technical issue had caused transaction information from some accounts to be shown to other customers in both the mobile app and internet banking. The group said it was not a cyber attack, the error was quickly resolved, and balances remained correct. The Financial Conduct Authority engaged with the group to assess what happened.
The cause has not been publicly disclosed in detail. But the shape of the failure is instructive regardless: a professor of financial technology at the University of Manchester described the event as “unusual,” and suggested that as data architectures become more complex and data openness greater, such issues could become more frequent.
This represents a different failure mode from Amazon’s production pipeline issues and McKinsey’s prompt layer exposure. In Lloyds’ case, the problem lay in data isolation, where the boundary between one customer’s data and another’s broke down within a system serving 26 million users. There was no targeted attack or autonomous agent involved. Instead, it was the result of complexity building up over time in a critical system, until a change or specific condition exposed a weakness the architecture was not designed to contain.
According to data compiled by lawmakers on the Treasury Committee, the UK’s largest banks recorded at least 158 IT failures between January 2023 and February 2025, amounting to more than 800 hours of service disruption – including 12 outages reported by Lloyds Banking Group alone. The March 2026 incident was different in kind from most of those: previous outages typically left customers unable to access their own accounts. This one showed them someone else’s.
Identify potential risks and vulnerabilities in your systems to protect your organisation from all angles.
The pattern all three stories share
These are three different organisations, three different failure modes, and three different threat vectors. But all three trace back to a single root problem: capabilities were deployed, expanded, or allowed to grow in complexity faster than the governance structures around them matured.
At Amazon, it was the production pipeline: AI tools given the ability to make changes in critical systems without the procedural checks that would catch and contain errors. At McKinsey, it was the security model: an AI platform built and expanded over two years without the same rigour applied to the prompt layer, the database access controls, and the unauthenticated API surface. At Lloyds, it was the data architecture: a system of sufficient complexity that a failure in isolation between accounts could propagate across the app layer and reach customers before it was caught.
In all three cases, the systems worked exactly as designed, until a condition exposed a gap in the structure around them. That’s what makes these stories instructive rather than simply alarming.
What the reset tells us
Amazon’s 90-day reset is being framed in some quarters as a retreat from AI. It isn’t. It is a recognition that the deployment model needs to catch up with the capability model.
It also amounts to a large-scale experiment in re-introducing human and procedural checks into AI-accelerated development – not to abandon AI, but to ensure that when things do go wrong, a single mistake cannot cascade into millions of failed transactions.
The specific measures: mandatory two-person review, formal documentation before deployment, automated reliability checks, leadership accountability, are not novel ideas. They are the baseline practices that should have been in place before AI tooling was introduced at scale. What the reset represents is an organisation reconnecting those fundamentals with a delivery model that had outpaced them.
This is a pattern we see repeatedly. Organisations rarely run into trouble with AI-assisted development because they are moving too slowly. The problem usually starts when delivery speeds up, but governance does not keep pace. The tool is introduced first, while the guardrails, roles, and controls it depends on are worked out later.
Quality gates are not the opposite of speed
There is a version of this conversation where the takeaway is “slow down.” We don’t think that’s the right reading of either story.
The right reading is that quality gates, human review at the right points, automated checks on every change, and security treated as a structural requirement rather than an afterthought are not obstacles to AI-assisted delivery. They are what make AI-assisted delivery reliable enough to trust in production.
An AI-native delivery model, built properly, has automated quality gates on every commit – static analysis, security scanning, architecture compliance, dependency verification. It has human engineers who own the full delivery context, not handoff chains where accountability diffuses. And it treats the AI toolchain itself, including the prompt layer that governs AI behaviour, as a security surface requiring the same protection as code and infrastructure.
Amazon’s engineers had the tools and they had the mandate to use them. What they didn’t have, in some critical systems, was the structure that should have surrounded those tools from the start. The 90-day reset is the process of building that structure retrospectively.
The lesson for any organisation deploying AI in software development is that building the structure first is considerably less expensive than building it after the outage.
If your organisation is moving towards AI-assisted development, or already there, the question worth asking is whether the structure around your tooling was designed alongside the tooling itself, or whether it’s catching up. The issues Amazon, McKinsey, and Lloyds encountered aren’t unique to companies of their scale. The same gaps appear in teams of 10 as in teams of 10,000, and the blast radius is proportional to how much your AI toolchain is trusted to act autonomously in production.
At Future Processing, we build software using an AI-native delivery model that has quality controls built into the process from the start, not added after the fact. Every engagement includes automated quality gates on every commit covering static analysis, security scanning, architecture compliance, and dependency verification. Engineers own the full delivery context end-to-end, with no handoff chains where accountability diffuses. And the AI toolchain itself – including the prompt and configuration layer – is treated as a security surface, not an afterthought.
We work with mid-market companies across the UK on new digital products, legacy modernisation, and operational automation. Engagements start with a fixed-price sprint of 1 to 3 weeks, so you see working software on your real data before committing to anything larger.
If you’d like to talk through how this applies to your specific situation, get in touch with our team — we’re happy to have a no-commitment conversation about where the structure gaps are and how to address them.
Developing an AI platform that saves law firms up to 75% of document review time