By Lumia Labs/ On 24 Jan, 2026

Who's Reviewing Your AI-Generated Code?

96% of developers believe AI-generated code isn’t functionally correct. Yet only 48% say they always check AI code before committing.

The gap between what developers believe and what they do defines the state of AI coding in 2026. We’re building systems on a foundation that the software developers themselves don’t trust, and roughly half aren’t verifying before it hits production.

These findings come from Sonar’s 2026 State of Code Developer Survey, covering 1,100 developers. Looking at other research from METR, CodeRabbit, and Veracode, and our experience with development teams using AI, we see some challenges.

AI coding is now the default

72% of developers who’ve tried AI tools now use them daily or multiple times daily. GitHub Copilot leads adoption at 75%, followed closely by ChatGPT at 74%, Claude at 48%, and Gemini at 37%. Cursor, which barely existed two years ago, is now at 31%.

The code itself tells the story: 42% of production code now includes significant AI assistance. That’s up from 6% in 2023. Projections suggest 65% by 2027.

Developers use corporate-approved AI tools, but 35% of developers (also) use personal accounts! That means a third of your team might be feeding proprietary code into AI systems that you don’t control.

The verification gap

Back to that 96% figure. Developers have seen where AI goes wrong: hallucinated functions, subtle logic errors, the security patterns that look right but aren’t. Almost everyone understands the problem.

Yet verification is inconsistent. Often code is not properly reviewed before it hits production. The Sonar survey found that while 95% of developers spend at least some effort reviewing AI output, the distribution varies widely. 59% rate that effort as “moderate” or “substantial.” And that remaining 5%? They’re shipping AI code with minimal or no review at all.

When LLMs can generate more code faster, teams should focus on reading and reviewing that code. 38% of developers say reviewing AI code takes more effort than reviewing human code. Only 27% say it takes less.

For many teams, the time saved on writing is being eaten by review overhead. Or worse, skipped entirely. What we see in practice is that a lot of junior engineers don’t even read their own pull requests anymore.

Quality concerns

External research confirms that AI code has quality concerns. CodeRabbit’s 2025 analysis found AI-generated code has 1.7x more issues than human-written code. Veracode’s 2025 GenAI Code Security Report found that 45% of AI-generated code contains vulnerabilities.

When asked about their concerns, developers ranked them this way:

AI code that looks correct but isn’t reliable: 61%
Exposure of sensitive data: 57%
Introduction of severe security vulnerabilities: 44%

AI writes code that’s subtly wrong in ways that are hard to catch during review, especially the kind of rushed review that happens when teams are shipping fast.

AI effectiveness

When asked where AI tools are most effective, developers state:

Creating documentation: 74%
Explaining and understanding existing code: 66%
Generating tests: 59%
Assisting in development of new code: 55%

AI works best on tasks that are well-defined, have clear success criteria, and don’t require deep business context. LLM generated documentation often does not catch the ‘why’ behind the code.

Where AI struggles are architectural decisions, business logic, security in context, and long-term maintainability. These require understanding the broader system, the organization’s constraints, and consequences that extend beyond the immediate task. With larger context windows and improved models these problems could improve in the future.

Smart teams are deploying AI strategically. They know where the AI helps them, but they’re not expecting it to replace thoughtful engineering.

Agents

64% of developers have now used AI agents like Claude Code. 25% use them regularly. The top use cases are creating documentation (68%), automating test generation (61%), and automating code review (57%).

Agents represent the next step: autonomous systems handling entire workflows with less human oversight. But the same trust issues that plague AI code generation get amplified when agents operate independently.

Questions engineering leaders should be asking: What guardrails exist for agent actions? How do we audit what agents are doing? What’s our rollback strategy when agents make mistakes?

AI productivity

Does AI make developers more productive, or do they only feel more productive? The average time developers spend on tedious tasks is 10 hours per week. This hasn’t decreased despite widespread AI tool adoption. What’s changed is the composition of that time: Developers now spend it reviewing AI-generated code instead of writing code themselves.

The math doesn’t always add up. If AI makes code generation faster but review takes as long or longer, where’s the net gain? More code means more PRs, and then review becomes the bottleneck. Quality issues surface later in the pipeline. Time “saved” on writing gets spent on debugging and fixing.

METR’s 2025 randomized controlled trial found that experienced developers were actually 19% slower when using AI tools on real-world open source tasks. Productivity is hard to measure objectively, but it is not always true that AI makes developers more productive.

Measure your full delivery cycle and pick meaningful metrics.

What engineering leaders should do

Acknowledge the trust gap. Your developers don’t trust AI output. Build processes that account for this. Don’t assume AI equals automatic productivity gains.

Mandate verification standards. If only 48% always check AI code, make checking mandatory. Automated quality gates for AI-generated code. Enhanced review for AI-assisted commits. Improve pre-commit checks, the more you can catch automatic, the better.

Address shadow AI. 35% using personal accounts is a governance problem. Provide approved tools with enterprise controls, and create clear policies. If you have approved tools, don’t limit them for cost saving reasons, because that will drive people to use personal accounts.

Measure what matters. Track end-to-end delivery time, not just coding speed. Monitor code quality metrics over time. Watch for increases in post-deployment issues.

Right-size expectations. AI excels at documentation, test generation, and code explanation. It struggles with reliability, security, and architecture.

Prepare for agents. If your team is using agents, governance frameworks need to be in place now. Audit trails and rollback capabilities are non-negotiable.

The organizations that will thrive are the ones that adopt AI thoughtfully: realistic expectations, solid verification, clear governance.

Lumia Labs helps organizations build software that works. If you’re navigating AI tool adoption and want a technical perspective grounded in 25 years of enterprise experience, we’d like to hear from you.

Tags:

AI coding Developer productivity Code quality Engineering leadership

Engineering

By Lumia Labs/ On 01 Mar, 2026

From DevSecOps to Agentic DevSecOps

DevSecOps was built on a simple premise: integrate security into the way humans build software. But "everyone" now includes AI agents. They write code and merge pull requests. Your security model is still designed for humans. Redesigning security for agents is what we call Agentic DevSecOps. It changes how we think about identity, access control, verification, and accountability. DevSecOps assumed humans in the loop The whole point of DevSecOps was to make everyone own security by shifting security left and building it into development, so vulnerabilities got caught before they reached production. All of that assumed a human developer writing the code, understanding the intent, reviewing scan results, and making judgement calls about risk. The developers can use tooling like static analysis to flag potential issues but ultimately a human evaluates whether the flag is a false positive or a threat. A reviewer reads the diff and considers the broader implications. Sonar's 2025 survey found 42% of production code already involves AI, and that number is climbing. Once agents start opening PRs and merging their own code, none of these assumptions hold. What makes Agentic DevSecOps different Agentic DevSecOps means redesigning security for a world where AI agents write and ship your code. Who is the agent, and what can it do? In traditional DevSecOps, access controls are tied to human identities. When an AI agent opens a PR, whose permissions does it use? What should it be allowed to do? In our experience, most organizations run agents under a developer's personal credentials, which means the agent inherits permissions that were calibrated for a human's judgement, not an AI's. AI agents also pick their own dependencies. Veracode's research found 45% of AI-generated code contains vulnerabilities. An agent can introduce a dependency that's technically clean but architecturally wrong, or generate code that mimics a vulnerable pattern without triggering signature-based detection. Speed breaks the verification model AI agents can generate and ship code ten to a hundred times faster than humans. A 15-minute security scan works fine when developers push a few times a day. When agents push dozens of changes per hour, that scan becomes either a bottleneck that defeats the purpose of using agents, or gets bypassed "temporarily". Agents leave plenty of trail in commit messages and PR descriptions. But by the time a broken change surfaces in production, the agent has pushed dozens more commits on top of it. Finding the offending commits can be a big challenge, even when using agents. AI agents fail differently than humans The threat model for AI-generated code is different. Humans make predictable mistakes: forgotten input validation, copy-pasted insecure patterns, hardcoded credentials, shortcuts under pressure. A reviewer can spot these. AI agents generate code that looks correct, passes basic checks, and is subtly wrong. The code reads well, it just doesn't do what you think it does. New attack vectors are already showing up in agentic workflows: Prompt injection through code context An attacker embeds malicious instructions in a codebase comment or issue description. The AI agent reads that context, follows the instructions, and introduces a backdoor that looks like a legitimate code change. Researchers have already demonstrated that LLMs can be manipulated through their input context. An AI agent asked to add a feature might pull in a dependency that doesn't exist yet. If an attacker registers that package name first, the agent helpfully installs the malicious package. Agents that modify lockfiles as part of their workflow bypass these protections. Zero human eyes An AI agent writes the code, another reviews it, an automated pipeline deploys it. Nobody planned for a fully automated path to production, but the steps chain together into one. AI agents need API keys and service credentials to do their work. An agent that logs its full context, or that includes secrets in a commit message or PR description, can expose credentials in places your secret scanning doesn't cover. The more autonomous the agent, the more credentials it touches. Improving security Give agents their own identity Create dedicated service identities for AI agents with scoped permissions. An agent that writes code shouldn't be able to merge it. An agent that runs tests shouldn't be able to modify the test configuration. In practice, we still find most agents running under a senior dev's personal token with full repo access. Treat them like any other service account: minimal permissions and audited access. Layer your verification A single security scan isn't enough. Stack static analysis, semantic analysis, behavioral testing, and anomaly detection on the diff patterns themselves. AI-generated code has detectable patterns, use that to improve verification. Slow agents down on purpose Put limits on how fast agents can push changes, and build circuit breakers that pause activity when anomalies appear: unusual dependency additions or changes to security-sensitive files. Track provenance The EU AI Act's transparency obligations already cover AI-generated code in regulated industries, and enforcement is coming. Every change should trace back to who (or what) wrote it, what prompted it, what context the agent had, and what review it received. Build the audit trail now. Enforce human review where it matters Not every change needs a human reviewer. But changes to authentication, authorization, payment processing, data handling, and infrastructure do. Define your high-risk zones and hold that line, even when it slows things down. The organizational shift Agentic DevSecOps is an organizational problem as much as a technical one. Security teams need to understand how agents fail. Dev teams should treat agents like a new hire: set guardrails, supervise the output. Platform infrastructure has to account for non-human participants in the pipeline. Organizations that get this right can deploy agents aggressively because they've built the controls to match. The alternative is bolting agents onto pipelines designed for humans and patching gaps after each incident.Lumia Labs helps organizations build secure engineering practices for AI-augmented teams. If you're deploying AI agents in your development pipeline and want to get security right, we'd like to hear from you.

Engineering

By Lumia Labs/ On 30 Jan, 2026

AI Might Be Making Your Team Worse

We've started to see reduced learning in AI-assisted development. A developer finishes a feature in half the usual time. The code works, the PR gets merged, everyone's happy. Except that same developer can't debug the same code without AI help. They shipped the code, but they didn't learn anything. They're fully relying on AI to help them. This week we came across research that puts data behind what we've been observing. Anthropic published a study called "How AI Impacts Skill Formation" that provides evidence for something engineering leaders have been quietly worrying about: AI coding tools can impair programming skill development. Anthropic's research Researchers ran a controlled experiment with developers learning Python Trio, an asynchronous programming library. They chose Trio specifically because it requires understanding new concepts like structured concurrency, not just Python syntax. Half the participants had access to AI assistance, half didn't. The results were pretty interesting: the AI group scored 17% lower on knowledge assessments, a 4.15-point gap on a 27-point quiz with a substantial effect size (Cohen's d = 0.738). Debugging questions showed the largest difference between groups, the skill that matters most when something breaks in production. The AI group encountered far fewer errors during the learning process. The median AI-assisted participant hit 1 error compared to 3 for the control group. On the surface that sounds like a benefit, but errors are how developers learn. RuntimeWarnings, TypeErrors, the frustration of debugging: these moments force you to understand how code works. The AI removed the struggle, and with it, the learning. AI usage patterns One of the more useful parts of the research was identifying six distinct patterns in how developers use AI, with dramatically different learning outcomes. Three patterns correlated with poor learning (quiz scores between 24-39%):AI Delegation: Handing everything to AI for code generation Progressive AI Reliance: Starting by asking questions but gradually delegating all the actual coding Iterative AI Debugging: Using AI to fix bugs without trying to understand why they happenedThree patterns preserved learning even with AI assistance (quiz scores between 65-86%):Generation-Then-Comprehension: Generating code but then asking follow-up questions to understand it Hybrid Code-Explanation: Requesting both code and explanations together Conceptual Inquiry: Only asking conceptual questions, then writing the code yourselfWhat is interesting to us is that developers learn when they are mentally engaged with the problem. The high-scoring patterns all have something in common: the developer kept thinking. They used AI to help them understand rather than to avoid understanding. Implications The first is what happens to your senior engineer pipeline. Junior developers traditionally build expertise through struggle: debugging, making mistakes, developing intuition for why things fail. If AI shortcuts this process for an entire generation of engineers, organizations may find themselves with fewer people capable of growing into senior technical roles. Another effect is the lack of in-depth knowledge about frameworks and programming languages that developers work with. When AI becomes the default way to learn new technologies, teams can accumulate technical dependencies without building the deep understanding needed to maintain and evolve those systems. We've seen this already with teams that adopted frameworks quickly using AI assistance but now struggle to debug issues or make architectural changes because nobody truly learned the underlying technology. This matters especially in safety-critical domains. Security, infrastructure, financial systems all require people who can review code, not just accept what AI generates. You can't effectively review code for a library you've never really learned yourself. The human oversight that makes AI-assisted development safe depends on humans who understand what they're overseeing. We've written before about research showing experienced developers were 19% slower when using AI on real-world tasks. When you add reduced debugging ability to that picture, the long-term productivity costs start looking more significant than the short-term speed gains. What we think organizations should do The research isn't an argument against AI coding tools. We use them ourselves (a lot!). It's an argument for being intentional about how they're used, especially when learning is part of the goal. Make understanding part of code review. Ask developers to explain how their code works. The high-performing AI interaction patterns in the study all involved seeking explanations, and code review can reinforce the same habit. Recognize when learning mode is different from production mode. There's a real difference between using AI to ship a feature in a technology you know well versus using AI to learn something new. Organizations that acknowledge this distinction can adjust expectations accordingly. When someone is learning, slower is often better. Keep some productive struggle in the process. When developers are learning new technologies, consider limiting AI assistance or focusing it on explanation rather than code generation. Working through problems yourself is slower, but you keep what you learn. Watch for signs of dependency. Developers who can't explain code they wrote, who struggle to debug without AI assistance, or who seem stuck on technologies they've supposedly been using for months. These are early signals that skill formation isn't happening. Invest in real understanding, even when it's slower. Code that nobody truly understands is technical debt, even if it works. Making time for developers to build expertise in critical systems pays off when those systems need to evolve or when something goes wrong. AI-enhanced productivity is not a shortcut to competence AI coding tools offer a bargain: faster output today in exchange for potentially reduced capability tomorrow. For experienced developers working in familiar domains, that trade-off might work out fine. For developers learning new technologies, or for teams building systems that will require deep expertise to maintain, the cost may be higher than the benefit. The researchers put it directly: "AI-enhanced productivity is not a shortcut to competence." We think that's the right framing. Organizations that treat AI tools as a shortcut to competence may eventually find themselves with teams that can generate code but struggle to understand it, exactly when understanding matters most. AI is here to stay. Let's use it to make engineering teams stronger over time.We help organizations build sustainable engineering practices. If you're thinking through AI adoption and want to talk about maintaining team capability while capturing productivity gains, we'd like to hear from you.

Engineering

By Lumia Labs/ On 16 Jan, 2026

The Architecture Review Checklist

Nobody hands you a map when you inherit a codebase. Maybe you're the new CTO and this is your first week. Maybe your company just acquired software built by strangers. Maybe the founders left, and now it's you. The documentation is thin. The git history tells stories you weren't there for. The system runs, mostly, but you don't know why. You definitely don't know where it's going to break. This post gives you questions. Questions that surface problems before those problems surface themselves, usually on a holiday weekend. You'll learn more from the questions you can't answer than the ones you can. Each "I don't know" tells you something. Three areas matter most: security model, operational readiness, and change velocity. Each reveals different risks. Together, they tell you what you've actually inherited. Security Model: Trust Archaeology Security in inherited systems is archaeological. Layers of decisions made by different people, under different threat models, at different stages of the company's life. Your job is to excavate. Where does trust get granted? Can you trace the path from untrusted input to database write? Where exactly does the system decide to trust that input? Most teams can't draw this picture. Authentication happens in one service, authorization checks live somewhere else, input validation scatters across three microservices. That gap between components is where vulnerabilities hide. What's the blast radius? If one component gets compromised, what else falls with it? Look for shared database credentials, service accounts with God-mode access, secrets in environment variables that every service can read. These patterns made sense when the system was three developers and one server. Now they mean a single breach cascades everywhere. The 2013 Target breach started with an HVAC contractor's credentials and ended with 40 million stolen credit cards. Nothing stopped lateral movement once attackers were inside. What happens when auth fails? When the authentication service goes down, do requests fail or pass? Under pressure, many systems fail open: the "temporary bypass" that was never removed, the fallback that skips validation. These exist in almost every inherited codebase. Find them before an attacker does. What security decisions assumed a different world? The single-tenant system that became multi-tenant, the internal tool that became customer-facing. Security flaws can persist through multiple product releases, especially when inherited from dependencies. Ask when the last security review happened. Then ask what changed since. Security gaps don't just threaten data. They threaten the deal, the acquisition premium that evaporates after a breach. Operational Readiness: What Happens When It Breaks The system teaches you how it fails, but only if you're listening. Operational readiness means you can trace from "something's wrong" to "here's the line of code" before your customers start posting about it. What happens when the system breaks at night? If something fails, will the right person wake up with enough context to act? Many inherited systems have alerts that fire into Slack channels nobody watches after dinner. That's wishful thinking dressed as monitoring. Check who's actually on-call, what information they receive, and whether they can do anything useful with it. What failure modes has this system never experienced? If it's never seen a database failover, never handled a dependent service going dark, never weathered real production load, you don't know how it behaves in those scenarios. The absence of incidents might mean the system is resilient, but it might also mean you've been lucky. Which alerts does everyone ignore? Alert fatigue is operational debt with compound interest: every false positive trains your team to dismiss the next notification, until eventually the real incident gets lost in the noise. Ask how many alerts fired last week and how many were actionable. If less than half led to human action, your monitoring is mostly noise. How long does recovery actually take, and have you ever tested it? Most inherited systems have backups that have never been restored, failover procedures that have never been executed, and runbooks written by people who left years ago. Your documented recovery time means nothing until you've run through it under pressure. GitLab learned this the hard way in 2017 when they discovered during an incident that their backups weren't working. For context: DORA research shows that elite-performing teams restore service in under an hour. Low performers take between a week and a month. Who actually knows how this works? If the answer is one person, you have a single point of failure. Researchers call this the "bus factor," the number of people who could leave before a project stalls. A study at JetBrains found that files abandoned by their original developers tend to stay abandoned, becoming permanent blind spots in the codebase. If the answer is "the team that left," you're operating on muscle memory that's already fading. Change Velocity: The Fear Tax Developers who are scared to touch code batch changes into risky big releases. Fixes get deferred because they might break something else. Technical debt accumulates because nobody wants to venture into the dangerous parts. Survey research across the software industry found that teams waste 23% of their development time dealing with technical debt. Deadline pressure is the most common cause. Fear is a legitimate architectural metric. Where do developers refuse to go? Every inherited system has these zones: the billing code nobody fully understands, the integration with that legacy system held together by careful attention and prayer. These become permanent blind spots and permanent sources of risk. What's the worst that happens from a typo? Can one wrong character bring down production? A healthy architecture survives small mistakes, while a fragile one demands perfection at all times. Do your tests actually catch bugs? A green test suite that misses regressions creates false confidence. Teams deploy because the tests passed, when the tests weren't checking what mattered. Look at the last few bugs that reached production. Should the tests have caught them? How long until a new engineer can ship? This measures friction. If it takes months to understand the system well enough to contribute safely, change will always be slow. The codebase is effectively defended against its own team. Etsy has new engineers deploy on day one. Can you undo a bad deploy in minutes? Willingness to ship correlates directly with ability to recover, and if rollback is scary, slow, or uncertain, every change feels permanent. Teams stop taking reasonable risks and the system calcifies. What to Do With These Answers Start with the questions you couldn't answer. Those are your blind spots, and blind spots don't stay hidden forever. For the answers that worried you, write them down now. Next week the urgency will fade, you'll rationalize, and the system will keep running while you tell yourself it's probably fine. Capture the concern while you still feel it. Document the surprises for your team. If the mental model doesn't match reality, someone else will hit the same confusion, probably during an incident at the worst possible time. These questions won't give you a complete picture. Nothing will. But they'll tell you where the gaps are, and that's where to start.Lumia Labs partners with organizations navigating exactly this situation: inherited systems and pressure to move forward. If you want a second set of eyes on what you've inherited, let's talk.

Engineering

By Lumia Labs/ On 08 Jan, 2025

The Hidden Costs of Vibe Coding

The demo was impressive. A developer typed a prompt, and within seconds, working code appeared. The team lead smiled. Finally, a way to ship faster. Six months later, that same team is drowning in technical debt they can't explain, debugging code nobody fully understands, and wondering why their "accelerated" project is now three months behind schedule. We've seen this play out. According to MIT's GenAI Divide report, 95% of enterprise AI pilots fail to deliver rapid revenue growth or measurable cost savings. More striking: 42% of companies abandoned most of their AI initiatives in 2025, more than double the abandonment rate from 2024. So what's happening? And more importantly, how should technical decision makers evaluate AI coding tools before adoption? The Flow-Debt Trade-off AI coding tools excel at one thing: generating plausible code quickly. That speed feels like productivity, but it isn't always. The pattern we've seen repeatedly goes like this: initial development velocity spikes, developers report feeling more productive, and early features ship fast. Then the problems start appearing. The generated code works, but it carries hidden assumptions: database queries that scan full tables, authentication flows that skip edge cases, API contracts that assume sunny-day scenarios only. Each piece makes sense in isolation, but together they create a system that gets harder to change with every addition. Researchers call this the flow-debt trade-off: the seamless experience of generating code creates an accumulation of technical debt through architectural inconsistencies, security gaps, and maintenance overhead that only reveals itself later. No architecture, no context The same patterns show up again and again in AI-generated code, all stemming from the same limitation: AI tools optimize for the immediate task, not the system as a whole. Architecture decisions get flattened. The AI doesn't see your deployment constraints, your team's operational capacity, or your three-year roadmap. The result is often monolithic structures that work fine initially but resist scaling individual components independently. Database queries go unoptimized. Generated code frequently uses ORM patterns that hide inefficient queries. Things work fine with 1,000 records. At 100,000 records, response times spike. At a million, the system becomes unusable during peak load. Error handling stays shallow. AI generates the happy path well. It's less consistent with failure modes, retry logic, circuit breakers, and graceful degradation. Systems built this way work until something goes wrong, then fail in unpredictable ways. Security gets surface treatment. Input validation appears, but business logic vulnerabilities slip through. Authorization checks exist, but privilege escalation paths remain. The code looks secure without being secure. Observability is an afterthought. Logging statements appear, but structured logging for production debugging is rare. Metrics, traces, and alerting configurations are usually missing entirely. Best practices If you're using AI coding tools (and most teams are), here's how to get the benefits without the debt: Measure total cost, not initial velocity. Track time spent debugging AI-generated code, refactoring architectural decisions, and addressing security findings. Compare against the time saved during generation. Run your security review unchanged. Don't reduce scrutiny because the code "came from AI." If anything, increase it. Generated code often passes cursory review while hiding subtle issues. Assess architectural coherence at milestones. Regularly examine whether the codebase still follows your intended patterns. Drift happens fast with generated code because each snippet optimizes locally, not globally. Keep doing pull request reviews. Code review matters more with AI-generated code, not less. If you're the one creating the PR, review your own code before asking others to look at it. The AI wrote it, but you're responsible for it. Plan for refactoring cycles. AI-assisted codebases typically need more aggressive refactoring than traditionally developed ones. Budget for this upfront. Keep humans on critical paths. Authentication, authorization, payment processing, and data handling warrant extra scrutiny regardless of how the initial code was written. The companies getting it right The organizations succeeding with AI coding tools share common patterns: they treat generated code as a starting point rather than a finished product, maintain strong architectural oversight, and invest in code review practices that catch the systematic issues AI introduces. They also recognize that developer productivity and system quality are different metrics. Optimizing for one at the expense of the other creates problems that take years to resolve. We've spent 25 years building enterprise systems. The fundamentals haven't changed: good architecture enables teams to move fast without breaking things. AI tools don't change this equation. They just make it easier to skip the foundational work that pays off later. If you're evaluating AI coding tools, start with contained experiments. Measure outcomes over months, not days. And bring architectural thinking to the conversation before you have thousands of lines of generated code that nobody fully understands. The technology is genuinely useful, and finding the balance requires the kind of judgment that can't be automated.Lumia Labs helps organizations build scalable systems and improve existing codebases. If you're navigating AI adoption and want a technical perspective, we'd like to hear from you.

Engineering

By Lumia Labs/ On 05 Feb, 2026

Innovation Isn't a Department

Companies that want to stay competitive often create innovation labs: dedicated teams, separate from core operations, charged with building the next breakthrough. Give smart people room to experiment, the thinking goes, and good things will follow. Sometimes they do. One lab we worked with built a solid prototype that could have redefined the way the company works. They validated it with users and it was ready for integration. But the product teams had a full roadmap. The prototype used a different tech stack, so no team could own it. A year later, the innovation lab turned into a regular team, maintaining their own products instead of building new ones. The instinct to separate innovation makes sense. Clayton Christensen's Innovator's Dilemma showed why successful companies struggle with disruption: they're too good at serving existing customers. When a disruptive technology emerges, pursuing it looks like a bad business decision. The rational response, focusing on profitable customers, becomes the fatal one. Christensen proposed a solution: create separate units to explore disruptive innovations. Many companies followed this advice, but most failed anyway. Separate innovation units can work well for pure research and early-stage development. The model breaks down when those innovations need to be adopted by the core organization. Either keep the unit truly independent as a separate business, or embed innovation capacity across existing teams. Harvard Business Review research found that the vast majority of innovation labs don't deliver on their promise. Sometimes labs respond by shipping products themselves, but this creates its own problems. Innovation and maintenance require different skills and mindsets. Once a lab starts maintaining production systems, it becomes just another team focused on operations rather than experimentation. Why separation fails Cultural isolation. Innovation labs develop different values, risk tolerance, and ways of working. When projects need to return to the main organization, they face antibodies. The engineers who stayed behind resent the special treatment, and middle managers see threats to their budgets. "Not invented here" becomes "not wanted here." Misaligned metrics. The core business measures quarterly revenue and margin. The lab measures experimentation velocity and learning. Neither understands the other's definition of success. When budget pressure hits, labs get defunded because they can't show traditional ROI. Research on organizational ambidexterity by O'Reilly and Tushman found that companies need tight executive integration between exploratory and core units, but most never achieve it. Trapped capability. Google's Project Aristotle found that psychological safety, the belief that you can take risks without punishment, predicts 43% of the variance in team performance. Siloed labs might have it internally, but the broader organization usually doesn't. Innovative thinking stays trapped in the lab, and the rest of the company never learns to think differently. Executive attention drift. When the core business hits trouble, leadership focus shifts away from the lab. Budgets get frozen, timelines extend, and the best people get pulled back to "real" projects. Labs need sustained executive sponsorship to survive, and that sponsorship evaporates the moment quarterly numbers disappoint. Survival Kodak invented digital photography in the 1970s. Nokia saw smartphones coming years before the iPhone. Both companies had innovation happening somewhere in the organization. They had the ideas but couldn't execute on them. Research comparing Nokia and Kodak's failures found that both exhibited "cultural rigidity, misaligned leadership strategies, and reluctance to cannibalize profitable core businesses." They knew what was coming but couldn't act on it. HealthManagement.org's analysis found that at Nokia, fear drove dysfunction: top managers feared external competitors, middle managers feared internal politics. Nobody felt safe enough to push for radical change. Kodak conducted a study in 1981 showing they had about ten years to prepare for digital photography. They buried it to protect the film business. The capability existed, but the culture to use it didn't. Embedding innovation in the organization BCG research found that companies promoting innovation culture across the organization are 60% more likely to be innovation leaders. O'Reilly and Tushman studied ambidextrous organizations, companies that integrate exploration with exploitation under unified leadership. These succeed at breakthrough innovation 90% of the time. Siloed approaches succeed less than 25% of the time. Create space for experimentation Traditional metrics strangle experimental work. 3M lets employees spend 15% of their time on self-directed projects, which is where Post-it Notes came from. Google used to have 20% innovation time, which produced Gmail and AdSense. Create different success criteria for experimental work, but apply them across the organization. Make failure risk-free When failure carries career risk, people stop trying new things. Organizations that want innovation need to celebrate learning from failed experiments and promote people who took smart risks that didn't pan out. This requires changing how the whole organization responds to uncertainty. Rotate people and ideas When innovation lives in a lab, capability concentrates rather than spreads. Rotate engineers and product people through exploratory projects. Everyone should know how to run experiments, test assumptions, and kill ideas that aren't working. Bet on many small innovations Most corporate innovation gets one shot, but venture capitalists expect most bets to fail. The same portfolio thinking works for internal innovation: run many small experiments, kill the ones that aren't working, and double down on what shows promise. McKinsey's research found that top innovators have adopted agile practices organization-wide at eight times the rate of lagging companies. That means actual agility in how decisions get made, not just adopting the methodology. Culture eats labs for breakfast Kodak and Nokia both had the technology. They saw what was coming and built prototypes years ahead of competitors. What they lacked was an organization willing to bet on its own inventions. Innovation embedded in your culture can sustain itself over time because the whole organization learns to think differently. Innovation isolated in a lab rarely survives long-term.Lumia Labs works with organizations navigating technical and cultural change. If you're rethinking how your engineering teams approach innovation, we'd like to hear from you.

Engineering

By Lumia Labs/ On 13 Feb, 2026

Who's Accountable for Your AI Agents?

In 2022, a customer asked Air Canada's website chatbot about bereavement fares. The chatbot confidently told him to book a full-price ticket and apply for a partial refund within 90 days. That was wrong. Air Canada's actual policy requires requesting the discount before booking. The customer spent over $1,500 CAD on flights he wouldn't have booked at full price. When he applied for the refund, Air Canada denied the claim. Then they argued the chatbot was "a separate legal entity" and the company wasn't responsible for its statements. A British Columbia tribunal disagreed and ordered Air Canada to pay damages. But "the AI said it, not us" is the defense organizations reach for first. The accountability in agentic AI can be a big challenge. The shift McKinsey is describing McKinsey's 2025 research on the agentic organization frames AI as the largest organizational paradigm shift since the Industrial Revolution. Their model envisions "flat networks of hybrid agentic teams" with "real-time, embedded governance and agentic controls with human accountability." That last phrase does a lot of heavy lifting. It assumes organizations will figure out how to keep humans accountable for systems that act autonomously. Most haven't. A tool does what you tell it. An agent decides what to do. When a developer uses an AI coding assistant, the developer reviews the output and takes responsibility. When an AI agent autonomously processes claims, triages support tickets, or adjusts pricing, accountability blurs. Accountability challenges Nobody designs an unaccountable AI system on purpose. It happens through gaps that individually seem manageable. Diffused ownership Multiple teams contribute to an agent's behavior: the ML team trains the model, the platform team deploys it, the product team defines the rules, the data team manages the inputs. When something goes wrong, each team owns a piece but nobody owns the outcome. Braham and van Hees call this the problem of many hands. The more people involved in a decision, the less any individual feels responsible for the result. Opacity of reasoning When an AI agent makes a decision, even the people who built it often can't explain why. The European Union recognized this in the EU AI Act, which requires high-risk AI systems to allow human oversight and provide explanations for their decisions. The regulation is ahead of actual capabilities across industries. You can't comply your way out of a black box. Speed exceeds oversight AI agents operate at virtually unlimited speeds. A human approval step that adds thirty seconds sounds fine, until agents scale up and the humans in the loop can't keep up. Organizations face a trade-off: slow the agent down enough for human review, or let it run fast. Fast often wins, because it is cheaper. Organizational inertia Even when teams recognize these problems, existing structures resist change. Governance committees move much slower than the developers can ship new AI agents. The org chart wasn't built for systems that cross every departmental boundary simultaneously. Autonomy without accountability is liability Consider the Boeing 737 MAX. The MCAS system made autonomous decisions about flight control, and Boeing didn't adequately inform pilots about its behavior. When the system encountered situations its designers hadn't anticipated, 346 people died. Subsequent investigations revealed diffused accountability across the board: engineers, managers, and regulators all shared responsibility, which meant nobody felt fully responsible. AI agents are already making decisions about credit, healthcare triage, hiring, and content moderation. In the Netherlands, a tax authority algorithm wrongly accused over 26,000 families of fraud. Thousands faced financial ruin. The entire cabinet resigned. The consequences don't need to look like a plane crash to be devastating. Stanford's Human-Centered AI Institute maintains a collection of policy resources documenting how organizations deploying AI systems consistently underestimate the governance needed. The technology moves fast, governance moves slow, and harm happens in between. What we think organizations should do Governance before autonomy costs money: more people, slower release cycles, developer time. It is a big risk not to spend this money however. Assign outcome owners, not component owners Every AI agent needs a single person accountable for what it does in production. One owner for outcomes, not one per component. This person needs authority to shut the agent down when something goes wrong. Build observability before autonomy You wouldn't deploy a critical service without monitoring and alerting, that's what DevOps is all about. AI agents need the same treatment: logging of decisions, monitoring and automated alerts when behavior is different from expectations. Without observability, governance is guesswork. Define your model explicitly Decide upfront whether humans review decisions before they happen (human-in-the-loop), after they happen (human-on-the-loop), or only when anomalies are detected (human-over-the-loop). Each model has different risk profiles. If you don't choose, you get 'human-over-the-loop', which can be too late. Design for explainability When building AI agents, include decision logging and reasoning traces as core requirements. Run pre-mortems Before deploying an agent, ask: "If this agent causes harm, who is accountable and how will we know?" If nobody can answer clearly, the agent isn't ready for production. The governance gap is a leadership problem Organizational structures to govern AI agents lag behind. The "real-time, embedded governance" that McKinsey's research envisions is the right destination, but getting there requires deliberate work on accountability structures, oversight models, and organizational culture. The organizations that figure this out first will build the trust, internal and external, that lets them deploy AI agents more ambitiously. That trust comes at a cost: dedicated governance roles and engineering effort spent on observability instead of features. But sustained innovation runs on accountability. Without it, ambition becomes liability.Lumia Labs partners with organizations building governance and engineering practices for AI agents. If you're working through how to deploy AI autonomy responsibly, let's talk.

Engineering

By Lumia Labs/ On 09 Feb, 2026

You Don't Need Microservices

Segment built over 140 microservices to handle event routing. Three engineers ended up spending most of their time keeping them running instead of building features. They went back to a monolith. "Instead of enabling us to move faster," Alexandra Noonan wrote, "the small team found themselves mired in exploding complexity." We've seen this pattern play out with clients too. Amazon's Prime Video team moved from serverless microservices to a monolith, reducing infrastructure costs by 90%. Even the company that invented service-oriented architecture found it wasn't always the right answer. Amazon's architecture isn't yours The microservices movement started because monolithic applications became unwieldy as teams grew. Conway's Law played out: large codebases forced developers into coordination overhead. Jeff Bezos issued his famous API mandate around 2002, requiring all Amazon teams to communicate through service interfaces. That architectural decision enabled AWS. The pattern worked for Amazon, so the industry followed. By 2020, O'Reilly found that 77% of organizations had adopted microservices. But most of those organizations aren't Amazon. They don't have thousands of engineers, decades of infrastructure tooling, or the operational budget to manage hundreds of independent services. Martin Fowler noticed the pattern already in 2015, he wrote: "Almost all the successful microservice stories have started with a monolith that got too big and was broken up. Almost all the cases where I've heard of a system that was built as a microservice system from scratch, it has ended in serious trouble." Most microservices are monoliths in disguise Most microservice architectures we've worked with don't deliver on their promise. Mistakes we see: Multiple services share a database, making it difficult to run migrations on the database schema Difficult to deploy a single change to a single service, due to leaking abstractions and hidden dependencies Hard to coordinate deliveries across teams, as services are not independent Bad API design, making services expose their inner workings (this is bad in monoliths as well, but there it is still deployed in one delivery) Failure in one service cascades through the whole systemTaibi, Lenarduzzi, and Pahl catalogued 20 of these anti-patterns through practitioner interviews. We recognize most of them. It has a name: the distributed monolith. You pay the full operational tax of microservices but get none of the independence. Every service hop adds latency, and debugging across service boundaries takes roughly 35% longer than in a single process. Kelsey Hightower, then a Distinguished Engineer at Google Cloud, put it bluntly: "Monoliths are the future because the problem people are trying to solve with microservices doesn't really line up with reality." And: "Now you went from writing bad code to building bad infrastructure that you deploy the bad code on top of." We think the cognitive load on teams gets overlooked often. Team Topologies research by Matthew Skelton and Manuel Pais backs this up: what matters isn't which architecture you pick, but which one your teams can handle. Microservices multiply what a team needs to understand: networking, service discovery, distributed tracing, container orchestration. Method calls, not network calls There's another option: the modular monolith. A single deployment with strict internal boundaries between domain modules, where communication happens through method calls instead of network calls. One test pipeline, one deployment, but with the separation of concerns that prevents spaghetti code. Shopify runs one of the largest Ruby on Rails applications in the world: 2.8 million lines of code, 500,000+ commits, hundreds of active developers. They evaluated microservices and explicitly rejected them. Kirsten Westeinde from Shopify's engineering team wrote that they wanted "a solution that increased modularity without increasing the number of deployment units." They landed on 37 components with defined boundaries, enforced by tools like Packwerk for static analysis. Developers work within clear boundaries without the overhead of distributed systems. Basecamp's DHH described their approach in 2016: 12 programmers serving millions of users across six platforms, with a monolith of 200 controllers and 190 model classes. In 2023, he went further with "How to Recover from Microservices", citing Gall's Law: "A complex system that works is invariably found to have evolved from a simple system that worked." Even Google recognized the pattern. Their Service Weaver framework lets you write an application as a modular monolith and deploy it as microservices only when needed. The architecture starts simple and gains complexity only where the system demands it. Most teams are too small for microservices Microservices solve problems that simpler architectures can't. Hundreds of engineers who need to deploy independently. Components with fundamentally different scaling requirements. At that scale, microservices pay for their complexity. Stefan Tilkov argues that starting with a monolith creates coupling that's hard to undo later. He's right that it can, if you don't enforce module boundaries. But his own caveat is telling: the approach requires "deep domain expertise" and suits larger systems only. The research behind Accelerate (Forsgren, Humble, Kim) shows that elite software delivery performance correlates with architecture enabling independent deployment. That's achievable with either microservices or a well-modularized monolith. Team autonomy drives performance, not the architectural pattern. If your engineering organization has fewer than 50 developers, you almost certainly don't need microservices. Start with modules and move to microservices when the pressure demands it. Modules first, services later When we help teams rethink their architecture, we use these principles: Draw domain boundaries first. Before any architecture choice, understand your business domains. Get that wrong and no architecture saves you. Enforce boundaries in code. Module boundaries without enforcement erode within months. Use tooling like SonarQube to make violations visible. Measure cognitive load. If your team spends more time on infrastructure than features, your architecture is too complex. Plan ahead. A well-modularized monolith can graduate to microservices when needed. Design modules as if they could become services.Lumia Labs helps organizations make architectural decisions. If you're reconsidering your architecture, we'd like to hear from you.

Who's Reviewing Your AI-Generated Code?

AI coding is now the default

The verification gap

Quality concerns

AI effectiveness

Agents

AI productivity

What engineering leaders should do

Tags:

Share:

Latest Posts

Categories

Tags

Related Posts

From DevSecOps to Agentic DevSecOps

AI Might Be Making Your Team Worse

The Architecture Review Checklist

The Hidden Costs of Vibe Coding

Innovation Isn't a Department

Who's Accountable for Your AI Agents?

You Don't Need Microservices

services

Contact Info