Type something to search...

Blog

Engineering
Who's Reviewing Your AI-Generated Code?
By Lumia Labs/ On 24 Jan, 2026

Who's Reviewing Your AI-Generated Code?

96% of developers believe AI-generated code isn't functionally correct. Yet only 48% say they always check AI code before committing. The gap between what developers believe and what they do defines the state of AI coding in 2026. We're building systems on a foundation that the software developers themselves don't trust, and roughly half aren't verifying before it hits production. These findings come from Sonar's 2026 State of Code Developer Survey, covering 1,100 developers. Looking at other research from METR, CodeRabbit, and Veracode, and our experience with development teams using AI, we see some challenges. AI coding is now the default 72% of developers who've tried AI tools now use them daily or multiple times daily. GitHub Copilot leads adoption at 75%, followed closely by ChatGPT at 74%, Claude at 48%, and Gemini at 37%. Cursor, which barely existed two years ago, is now at 31%. The code itself tells the story: 42% of production code now includes significant AI assistance. That's up from 6% in 2023. Projections suggest 65% by 2027. Developers use corporate-approved AI tools, but 35% of developers (also) use personal accounts! That means a third of your team might be feeding proprietary code into AI systems that you don't control. The verification gap Back to that 96% figure. Developers have seen where AI goes wrong: hallucinated functions, subtle logic errors, the security patterns that look right but aren't. Almost everyone understands the problem. Yet verification is inconsistent. Often code is not properly reviewed before it hits production. The Sonar survey found that while 95% of developers spend at least some effort reviewing AI output, the distribution varies widely. 59% rate that effort as "moderate" or "substantial." And that remaining 5%? They're shipping AI code with minimal or no review at all. When LLMs can generate more code faster, teams should focus on reading and reviewing that code. 38% of developers say reviewing AI code takes more effort than reviewing human code. Only 27% say it takes less. For many teams, the time saved on writing is being eaten by review overhead. Or worse, skipped entirely. What we see in practice is that a lot of junior engineers don't even read their own pull requests anymore. Quality concerns External research confirms that AI code has quality concerns. CodeRabbit's 2025 analysis found AI-generated code has 1.7x more issues than human-written code. Veracode's 2025 GenAI Code Security Report found that 45% of AI-generated code contains vulnerabilities. When asked about their concerns, developers ranked them this way:AI code that looks correct but isn't reliable: 61% Exposure of sensitive data: 57% Introduction of severe security vulnerabilities: 44%AI writes code that's subtly wrong in ways that are hard to catch during review, especially the kind of rushed review that happens when teams are shipping fast. AI effectiveness When asked where AI tools are most effective, developers state:Creating documentation: 74% Explaining and understanding existing code: 66% Generating tests: 59% Assisting in development of new code: 55%AI works best on tasks that are well-defined, have clear success criteria, and don't require deep business context. LLM generated documentation often does not catch the 'why' behind the code. Where AI struggles are architectural decisions, business logic, security in context, and long-term maintainability. These require understanding the broader system, the organization's constraints, and consequences that extend beyond the immediate task. With larger context windows and improved models these problems could improve in the future. Smart teams are deploying AI strategically. They know where the AI helps them, but they're not expecting it to replace thoughtful engineering. Agents 64% of developers have now used AI agents like Claude Code. 25% use them regularly. The top use cases are creating documentation (68%), automating test generation (61%), and automating code review (57%). Agents represent the next step: autonomous systems handling entire workflows with less human oversight. But the same trust issues that plague AI code generation get amplified when agents operate independently. Questions engineering leaders should be asking: What guardrails exist for agent actions? How do we audit what agents are doing? What's our rollback strategy when agents make mistakes? AI productivity Does AI make developers more productive, or do they only feel more productive? The average time developers spend on tedious tasks is 10 hours per week. This hasn't decreased despite widespread AI tool adoption. What's changed is the composition of that time: Developers now spend it reviewing AI-generated code instead of writing code themselves. The math doesn't always add up. If AI makes code generation faster but review takes as long or longer, where's the net gain? More code means more PRs, and then review becomes the bottleneck. Quality issues surface later in the pipeline. Time "saved" on writing gets spent on debugging and fixing. METR's 2025 randomized controlled trial found that experienced developers were actually 19% slower when using AI tools on real-world open source tasks. Productivity is hard to measure objectively, but it is not always true that AI makes developers more productive. Measure your full delivery cycle and pick meaningful metrics. What engineering leaders should do Acknowledge the trust gap. Your developers don't trust AI output. Build processes that account for this. Don't assume AI equals automatic productivity gains. Mandate verification standards. If only 48% always check AI code, make checking mandatory. Automated quality gates for AI-generated code. Enhanced review for AI-assisted commits. Improve pre-commit checks, the more you can catch automatic, the better. Address shadow AI. 35% using personal accounts is a governance problem. Provide approved tools with enterprise controls, and create clear policies. If you have approved tools, don't limit them for cost saving reasons, because that will drive people to use personal accounts. Measure what matters. Track end-to-end delivery time, not just coding speed. Monitor code quality metrics over time. Watch for increases in post-deployment issues. Right-size expectations. AI excels at documentation, test generation, and code explanation. It struggles with reliability, security, and architecture. Prepare for agents. If your team is using agents, governance frameworks need to be in place now. Audit trails and rollback capabilities are non-negotiable. The organizations that will thrive are the ones that adopt AI thoughtfully: realistic expectations, solid verification, clear governance.Lumia Labs helps organizations build software that works. If you're navigating AI tool adoption and want a technical perspective grounded in 25 years of enterprise experience, we'd like to hear from you.

Engineering
The Architecture Review Checklist
By Lumia Labs/ On 16 Jan, 2026

The Architecture Review Checklist

Nobody hands you a map when you inherit a codebase. Maybe you're the new CTO and this is your first week. Maybe your company just acquired software built by strangers. Maybe the founders left, and now it's you. The documentation is thin. The git history tells stories you weren't there for. The system runs, mostly, but you don't know why. You definitely don't know where it's going to break. This post gives you questions. Questions that surface problems before those problems surface themselves, usually on a holiday weekend. You'll learn more from the questions you can't answer than the ones you can. Each "I don't know" tells you something. Three areas matter most: security model, operational readiness, and change velocity. Each reveals different risks. Together, they tell you what you've actually inherited. Security Model: Trust Archaeology Security in inherited systems is archaeological. Layers of decisions made by different people, under different threat models, at different stages of the company's life. Your job is to excavate. Where does trust get granted? Can you trace the path from untrusted input to database write? Where exactly does the system decide to trust that input? Most teams can't draw this picture. Authentication happens in one service, authorization checks live somewhere else, input validation scatters across three microservices. That gap between components is where vulnerabilities hide. What's the blast radius? If one component gets compromised, what else falls with it? Look for shared database credentials, service accounts with God-mode access, secrets in environment variables that every service can read. These patterns made sense when the system was three developers and one server. Now they mean a single breach cascades everywhere. The 2013 Target breach started with an HVAC contractor's credentials and ended with 40 million stolen credit cards. Nothing stopped lateral movement once attackers were inside. What happens when auth fails? When the authentication service goes down, do requests fail or pass? Under pressure, many systems fail open: the "temporary bypass" that was never removed, the fallback that skips validation. These exist in almost every inherited codebase. Find them before an attacker does. What security decisions assumed a different world? The single-tenant system that became multi-tenant, the internal tool that became customer-facing. Security flaws can persist through multiple product releases, especially when inherited from dependencies. Ask when the last security review happened. Then ask what changed since. Security gaps don't just threaten data. They threaten the deal, the acquisition premium that evaporates after a breach. Operational Readiness: What Happens When It Breaks The system teaches you how it fails, but only if you're listening. Operational readiness means you can trace from "something's wrong" to "here's the line of code" before your customers start posting about it. What happens when the system breaks at night? If something fails, will the right person wake up with enough context to act? Many inherited systems have alerts that fire into Slack channels nobody watches after dinner. That's wishful thinking dressed as monitoring. Check who's actually on-call, what information they receive, and whether they can do anything useful with it. What failure modes has this system never experienced? If it's never seen a database failover, never handled a dependent service going dark, never weathered real production load, you don't know how it behaves in those scenarios. The absence of incidents might mean the system is resilient, but it might also mean you've been lucky. Which alerts does everyone ignore? Alert fatigue is operational debt with compound interest: every false positive trains your team to dismiss the next notification, until eventually the real incident gets lost in the noise. Ask how many alerts fired last week and how many were actionable. If less than half led to human action, your monitoring is mostly noise. How long does recovery actually take, and have you ever tested it? Most inherited systems have backups that have never been restored, failover procedures that have never been executed, and runbooks written by people who left years ago. Your documented recovery time means nothing until you've run through it under pressure. GitLab learned this the hard way in 2017 when they discovered during an incident that their backups weren't working. For context: DORA research shows that elite-performing teams restore service in under an hour. Low performers take between a week and a month. Who actually knows how this works? If the answer is one person, you have a single point of failure. Researchers call this the "bus factor," the number of people who could leave before a project stalls. A study at JetBrains found that files abandoned by their original developers tend to stay abandoned, becoming permanent blind spots in the codebase. If the answer is "the team that left," you're operating on muscle memory that's already fading. Change Velocity: The Fear Tax Developers who are scared to touch code batch changes into risky big releases. Fixes get deferred because they might break something else. Technical debt accumulates because nobody wants to venture into the dangerous parts. Survey research across the software industry found that teams waste 23% of their development time dealing with technical debt. Deadline pressure is the most common cause. Fear is a legitimate architectural metric. Where do developers refuse to go? Every inherited system has these zones: the billing code nobody fully understands, the integration with that legacy system held together by careful attention and prayer. These become permanent blind spots and permanent sources of risk. What's the worst that happens from a typo? Can one wrong character bring down production? A healthy architecture survives small mistakes, while a fragile one demands perfection at all times. Do your tests actually catch bugs? A green test suite that misses regressions creates false confidence. Teams deploy because the tests passed, when the tests weren't checking what mattered. Look at the last few bugs that reached production. Should the tests have caught them? How long until a new engineer can ship? This measures friction. If it takes months to understand the system well enough to contribute safely, change will always be slow. The codebase is effectively defended against its own team. Etsy has new engineers deploy on day one. Can you undo a bad deploy in minutes? Willingness to ship correlates directly with ability to recover, and if rollback is scary, slow, or uncertain, every change feels permanent. Teams stop taking reasonable risks and the system calcifies. What to Do With These Answers Start with the questions you couldn't answer. Those are your blind spots, and blind spots don't stay hidden forever. For the answers that worried you, write them down now. Next week the urgency will fade, you'll rationalize, and the system will keep running while you tell yourself it's probably fine. Capture the concern while you still feel it. Document the surprises for your team. If the mental model doesn't match reality, someone else will hit the same confusion, probably during an incident at the worst possible time. These questions won't give you a complete picture. Nothing will. But they'll tell you where the gaps are, and that's where to start.Lumia Labs partners with organizations navigating exactly this situation: inherited systems and pressure to move forward. If you want a second set of eyes on what you've inherited, let's talk.

Engineering
The Hidden Costs of Vibe Coding
By Lumia Labs/ On 08 Jan, 2025

The Hidden Costs of Vibe Coding

The demo was impressive. A developer typed a prompt, and within seconds, working code appeared. The team lead smiled. Finally, a way to ship faster. Six months later, that same team is drowning in technical debt they can't explain, debugging code nobody fully understands, and wondering why their "accelerated" project is now three months behind schedule. We've seen this play out. According to MIT's GenAI Divide report, 95% of enterprise AI pilots fail to deliver rapid revenue growth or measurable cost savings. More striking: 42% of companies abandoned most of their AI initiatives in 2025, more than double the abandonment rate from 2024. So what's happening? And more importantly, how should technical decision makers evaluate AI coding tools before adoption? The Flow-Debt Trade-off AI coding tools excel at one thing: generating plausible code quickly. That speed feels like productivity, but it isn't always. The pattern we've seen repeatedly goes like this: initial development velocity spikes, developers report feeling more productive, and early features ship fast. Then the problems start appearing. The generated code works, but it carries hidden assumptions: database queries that scan full tables, authentication flows that skip edge cases, API contracts that assume sunny-day scenarios only. Each piece makes sense in isolation, but together they create a system that gets harder to change with every addition. Researchers call this the flow-debt trade-off: the seamless experience of generating code creates an accumulation of technical debt through architectural inconsistencies, security gaps, and maintenance overhead that only reveals itself later. No architecture, no context The same patterns show up again and again in AI-generated code, all stemming from the same limitation: AI tools optimize for the immediate task, not the system as a whole. Architecture decisions get flattened. The AI doesn't see your deployment constraints, your team's operational capacity, or your three-year roadmap. The result is often monolithic structures that work fine initially but resist scaling individual components independently. Database queries go unoptimized. Generated code frequently uses ORM patterns that hide inefficient queries. Things work fine with 1,000 records. At 100,000 records, response times spike. At a million, the system becomes unusable during peak load. Error handling stays shallow. AI generates the happy path well. It's less consistent with failure modes, retry logic, circuit breakers, and graceful degradation. Systems built this way work until something goes wrong, then fail in unpredictable ways. Security gets surface treatment. Input validation appears, but business logic vulnerabilities slip through. Authorization checks exist, but privilege escalation paths remain. The code looks secure without being secure. Observability is an afterthought. Logging statements appear, but structured logging for production debugging is rare. Metrics, traces, and alerting configurations are usually missing entirely. Best practices If you're using AI coding tools (and most teams are), here's how to get the benefits without the debt: Measure total cost, not initial velocity. Track time spent debugging AI-generated code, refactoring architectural decisions, and addressing security findings. Compare against the time saved during generation. Run your security review unchanged. Don't reduce scrutiny because the code "came from AI." If anything, increase it. Generated code often passes cursory review while hiding subtle issues. Assess architectural coherence at milestones. Regularly examine whether the codebase still follows your intended patterns. Drift happens fast with generated code because each snippet optimizes locally, not globally. Keep doing pull request reviews. Code review matters more with AI-generated code, not less. If you're the one creating the PR, review your own code before asking others to look at it. The AI wrote it, but you're responsible for it. Plan for refactoring cycles. AI-assisted codebases typically need more aggressive refactoring than traditionally developed ones. Budget for this upfront. Keep humans on critical paths. Authentication, authorization, payment processing, and data handling warrant extra scrutiny regardless of how the initial code was written. The companies getting it right The organizations succeeding with AI coding tools share common patterns: they treat generated code as a starting point rather than a finished product, maintain strong architectural oversight, and invest in code review practices that catch the systematic issues AI introduces. They also recognize that developer productivity and system quality are different metrics. Optimizing for one at the expense of the other creates problems that take years to resolve. We've spent 25 years building enterprise systems. The fundamentals haven't changed: good architecture enables teams to move fast without breaking things. AI tools don't change this equation. They just make it easier to skip the foundational work that pays off later. If you're evaluating AI coding tools, start with contained experiments. Measure outcomes over months, not days. And bring architectural thinking to the conversation before you have thousands of lines of generated code that nobody fully understands. The technology is genuinely useful, and finding the balance requires the kind of judgment that can't be automated.Lumia Labs helps organizations build scalable systems and improve existing codebases. If you're navigating AI adoption and want a technical perspective, we'd like to hear from you.

News
Welcome to the Lumia Labs Blog
By Lumia Labs/ On 01 Jan, 2025

Welcome to the Lumia Labs Blog

Welcome to the Lumia Labs blog. Here we share insights on software development, emerging technologies, and lessons learned from 25 years of building enterprise solutions. What to Expect We cover topics including:Software Architecture: Best practices for building scalable, maintainable systems AI & Machine Learning: Practical applications of AI in enterprise software Performance Optimization: Tips and techniques for faster, more efficient applications Legacy Modernization: Strategies for updating outdated systems Industry Insights: Trends and developments shaping the software landscapeStay Connected Follow along as we share lessons from our projects, explore new technologies, and discuss the challenges and opportunities in modern software development. Have a topic you'd like us to explore? We'd love to hear from you.