
Who's Reviewing Your AI-Generated Code?
96% of developers believe AI-generated code isn’t functionally correct. Yet only 48% say they always check AI code before committing.
The gap between what developers believe and what they do defines the state of AI coding in 2026. We’re building systems on a foundation that the software developers themselves don’t trust, and roughly half aren’t verifying before it hits production.
These findings come from Sonar’s 2026 State of Code Developer Survey, covering 1,100 developers. Looking at other research from METR, CodeRabbit, and Veracode, and our experience with development teams using AI, we see some challenges.
AI coding is now the default
72% of developers who’ve tried AI tools now use them daily or multiple times daily. GitHub Copilot leads adoption at 75%, followed closely by ChatGPT at 74%, Claude at 48%, and Gemini at 37%. Cursor, which barely existed two years ago, is now at 31%.
The code itself tells the story: 42% of production code now includes significant AI assistance. That’s up from 6% in 2023. Projections suggest 65% by 2027.
Developers use corporate-approved AI tools, but 35% of developers (also) use personal accounts! That means a third of your team might be feeding proprietary code into AI systems that you don’t control.
The verification gap
Back to that 96% figure. Developers have seen where AI goes wrong: hallucinated functions, subtle logic errors, the security patterns that look right but aren’t. Almost everyone understands the problem.
Yet verification is inconsistent. Often code is not properly reviewed before it hits production. The Sonar survey found that while 95% of developers spend at least some effort reviewing AI output, the distribution varies widely. 59% rate that effort as “moderate” or “substantial.” And that remaining 5%? They’re shipping AI code with minimal or no review at all.
When LLMs can generate more code faster, teams should focus on reading and reviewing that code. 38% of developers say reviewing AI code takes more effort than reviewing human code. Only 27% say it takes less.
For many teams, the time saved on writing is being eaten by review overhead. Or worse, skipped entirely. What we see in practice is that a lot of junior engineers don’t even read their own pull requests anymore.
Quality concerns
External research confirms that AI code has quality concerns. CodeRabbit’s 2025 analysis found AI-generated code has 1.7x more issues than human-written code. Veracode’s 2025 GenAI Code Security Report found that 45% of AI-generated code contains vulnerabilities.
When asked about their concerns, developers ranked them this way:
- AI code that looks correct but isn’t reliable: 61%
- Exposure of sensitive data: 57%
- Introduction of severe security vulnerabilities: 44%
AI writes code that’s subtly wrong in ways that are hard to catch during review, especially the kind of rushed review that happens when teams are shipping fast.
AI effectiveness
When asked where AI tools are most effective, developers state:
- Creating documentation: 74%
- Explaining and understanding existing code: 66%
- Generating tests: 59%
- Assisting in development of new code: 55%
AI works best on tasks that are well-defined, have clear success criteria, and don’t require deep business context. LLM generated documentation often does not catch the ‘why’ behind the code.
Where AI struggles are architectural decisions, business logic, security in context, and long-term maintainability. These require understanding the broader system, the organization’s constraints, and consequences that extend beyond the immediate task. With larger context windows and improved models these problems could improve in the future.
Smart teams are deploying AI strategically. They know where the AI helps them, but they’re not expecting it to replace thoughtful engineering.
Agents
64% of developers have now used AI agents like Claude Code. 25% use them regularly. The top use cases are creating documentation (68%), automating test generation (61%), and automating code review (57%).
Agents represent the next step: autonomous systems handling entire workflows with less human oversight. But the same trust issues that plague AI code generation get amplified when agents operate independently.
Questions engineering leaders should be asking: What guardrails exist for agent actions? How do we audit what agents are doing? What’s our rollback strategy when agents make mistakes?
AI productivity
Does AI make developers more productive, or do they only feel more productive? The average time developers spend on tedious tasks is 10 hours per week. This hasn’t decreased despite widespread AI tool adoption. What’s changed is the composition of that time: Developers now spend it reviewing AI-generated code instead of writing code themselves.
The math doesn’t always add up. If AI makes code generation faster but review takes as long or longer, where’s the net gain? More code means more PRs, and then review becomes the bottleneck. Quality issues surface later in the pipeline. Time “saved” on writing gets spent on debugging and fixing.
METR’s 2025 randomized controlled trial found that experienced developers were actually 19% slower when using AI tools on real-world open source tasks. Productivity is hard to measure objectively, but it is not always true that AI makes developers more productive.
Measure your full delivery cycle and pick meaningful metrics.
What engineering leaders should do
Acknowledge the trust gap. Your developers don’t trust AI output. Build processes that account for this. Don’t assume AI equals automatic productivity gains.
Mandate verification standards. If only 48% always check AI code, make checking mandatory. Automated quality gates for AI-generated code. Enhanced review for AI-assisted commits. Improve pre-commit checks, the more you can catch automatic, the better.
Address shadow AI. 35% using personal accounts is a governance problem. Provide approved tools with enterprise controls, and create clear policies. If you have approved tools, don’t limit them for cost saving reasons, because that will drive people to use personal accounts.
Measure what matters. Track end-to-end delivery time, not just coding speed. Monitor code quality metrics over time. Watch for increases in post-deployment issues.
Right-size expectations. AI excels at documentation, test generation, and code explanation. It struggles with reliability, security, and architecture.
Prepare for agents. If your team is using agents, governance frameworks need to be in place now. Audit trails and rollback capabilities are non-negotiable.
The organizations that will thrive are the ones that adopt AI thoughtfully: realistic expectations, solid verification, clear governance.
Lumia Labs helps organizations build software that works. If you’re navigating AI tool adoption and want a technical perspective grounded in 25 years of enterprise experience, we’d like to hear from you.




