How to Review AI Generated Code Without Breaking Architecture

The incident started at 2:00 AM on a Tuesday. We were chasing a ghost: intermittent null pointer exceptions in our payment reconciliation microservice. My logs showed data hitting the persistence layer, but the database remained unmoved. Three hours later, I found it. The AI assistant tasked with refactoring our JPA repository had hallucinated a custom method signature—findByTransactionIdAndStatus—and invented a corresponding SQL projection that didn't exist in our schema.

The code compiled. The tests, also AI-generated, passed because they were mocking the hallucinated repository method itself. Circular logic, perfectly executed, and completely broken.

We’re shipping code faster than ever. But velocity without verification is just technical debt in a trench coat. If you’re trying to review AI-generated code, stop treating it like a junior dev’s PR. Treat it like a black box that is actively trying to take the path of least resistance. It's rarely the path of architectural integrity.

The Vibe Coding Trap

"Vibe coding"—prompting until the syntax looks right and hitting commit—is the single greatest threat to codebase health I’ve seen in a decade. Research shows that 68% of enterprises currently rewrite over half of their AI-generated output before production. The culprit isn’t just bad logic; it’s architectural misalignment.

When an LLM writes code, it optimizes for plausibility, not maintainability. It satisfies the local scope. It ignores the clean architecture patterns you spent months enforcing.

Stop checking for syntax. Your IDE does that. Stop checking for formatting. Your linter does that. Your review must shift entirely toward contextual alignment. If the AI suggests a new pattern, does it match the existing dependency injection container? Does it bypass our established middleware? If the answer is "no," the code is a failure, regardless of whether it runs.

Architectural Integrity

Architectural drift is silent. It doesn't trigger alerts. It manifests as a gradual decay where your codebase loses its structural identity.

Treat AI-generated code as a foreign object. Use this checklist during every review:

1. Dependency Alignment: Does the code import something unauthorized? If the AI imports lodash in a project where we use native ESM functions, it’s a hard reject.
2. Scope Verification: Does the component talk to the database directly when it should be calling a service layer?
3. Pattern Parity: Does it use the error-handling wrapper we’ve defined in src/common/errors.ts?

If the AI generated a UserController that handles HTTP status codes manually instead of using ResponseHandler, reject it. Don't refactor it yourself. Send it back with the specific prompt: "Refactor to use ResponseHandler and adhere to the ControllerInterface in src/types/base.ts."

Ghost Dependencies

Hallucinations are the silent killer. An LLM doesn't know your internal API; it only knows the probability of what the function should look like.

Take this internal incident:

// AI-generated implementation
const payment = await stripe.paymentIntents.captureFullAmount(intentId); 
// CRITICAL: stripe.paymentIntents.captureFullAmount does not exist in v14.0.0

The AI hallucinated a helper method that sounded correct. Because the developer didn't verify against the type definition file, this would have crashed in production.

The rule is simple: If the AI invokes a method you haven't used before, verify it against the source definition. If you can't find it in your node_modules, it’s a hallucination. Treat every "magic" helper function as a security breach until proven otherwise.

Secure Audits

Security is where the "vibe" breaks down. Testing shows that 45% of AI-generated code contains security flaws, and 86% lack secure defaults against XSS.

We’ve seen AI produce SQL injection vulnerabilities by ignoring ORM parameterized query requirements. It prefers string concatenation because it's "easier."

The Security Audit Checklist

* The Sanitization Test: If the code handles req.body, is there an explicit validation/sanitization layer?
* The Prompt Injection Check: Are you using LLMs to generate code that processes untrusted AI input? As seen in CVE-2025-53773, malicious actors hide instructions in PR descriptions to trigger RCEs. Never pipe PR metadata into an AI agent without sanitizing the context.
* Hardcoded Secrets: Does the AI suggest process.env.STRIPE_KEY? Force the use of SecretManager or Vault patterns every single time.

Consistency via `rules.md`

Consistency is the antidote to technical debt. If your team is using Cursor or Copilot, keep a shared rules.md file in the repo root. This is a programmatic guardrail.

Standardized `rules.md`

# Architectural Guardrails for AI Assistants
1. NEVER use native `fetch`. Use `ApiClient` defined in `src/network/client.ts`.
2. ALL database queries must pass through the `Repository` pattern. Direct `prisma` access is prohibited.
3. ERROR HANDLING: Do not catch errors locally. Re-throw as `ApplicationError` from `src/errors/`.
4. SECURITY: All string inputs must be passed through `validator.escape()` before rendering.

If the AI violates these, you have an objective standard to point to.

Failure Pattern: The "Refactor Loop"

I once reviewed a PR where a developer let an AI iterate five times. Each "improvement" drifted further from the original design. By the end, the function was 400 lines long, had zero test coverage, and used three state management patterns.

The Workflow:
1. The "Draft" Phase: AI generates the initial skeleton.
2. The "Human" Phase: Lead dev reviews against rules.md.
3. The "Tightening" Phase: Developer prompts: "Refactor this into three separate functions following the Command pattern, using the existing BaseCommand class."

Never let the AI write an entire feature in one go. You’ll lose track of the complexity. Review the interface first. Then, and only then, review the implementation. If you can't understand why the AI chose a specific library, don't ship it. The speed you gain by skipping the review will be paid back with interest during your next on-call rotation. Don't be the engineer who pushes a hallucination to production because it "looked like it worked."