loke.dev
Header image for A Narrow Bridge Between a Failing Test and an LLM Response

A Narrow Bridge Between a Failing Test and an LLM Response

The most effective way to use AI isn't asking it to write a feature, but giving it the permission to fail until your test runner says otherwise.

· 4 min read

The terminal is screaming red. AssertionError: Expected '2023-10-27', got '10/27/2023'. Most people see this and start typing a long, rambling prompt into ChatGPT like they’re writing a letter to a Victorian pen pal: *"Dear AI, I am working on a React project and I have a date utility that isn't formatting correctly. Can you please look at my code and tell me what is wrong? Here is the file..."*

Stop. You’re overworking the machine and under-specifying the problem.

The most effective bridge between your brain and an LLM isn’t a descriptive paragraph; it’s the raw, ugly output of a failing test. When you give an AI the permission to fail—and the specific error message to fix—you stop being a "prompt engineer" and start being a director.

The Lazy (But Correct) Workflow

I’ve found that the more I try to "explain" the logic to an AI, the more it hallucinates some generic boilerplate it saw on StackOverflow in 2019. Instead, I’ve started using a "Narrow Bridge" approach.

1. Write a test that defines the outcome.
2. Watch it fail.
3. Feed the LLM the failing test and the current implementation.
4. Tell it: "Make this test pass. Change nothing else."

Let's look at a concrete example. Say I'm building a simple parser for a messy configuration string.

# test_parser.py
def test_extract_api_key():
    raw_input = "DEBUG=true; API_KEY=secret_123; TIMEOUT=30"
    assert extract_api_key(raw_input) == "secret_123"

def test_extract_api_key_missing():
    assert extract_api_key("DEBUG=false") is None

The initial code is just a placeholder because I'm lazy and haven't finished my coffee yet:

# config_utils.py
def extract_api_key(text):
    return "" 

Run pytest, and you get the red text. Instead of explaining the regex I want or the edge cases I’m worried about, I copy that failure message and the code into the LLM.

The Prompt:
*"Here is my code and the failing test output. Fix extract_api_key so the tests pass."*

Why This Works

When you give an AI a broad instruction ("Write a feature that parses strings"), it has infinite directions to go. It might add logging you don't need, use a library you haven't installed, or handle edge cases you don't care about yet.

When you give it a failing test, you’ve narrowed the bridge. The AI now has a clear definition of "done." It’s no longer guessing what "good" looks like; "good" is when the test runner stops complaining.

I’ve noticed that LLMs are surprisingly good at logic when they are constrained. If you give them 1,000 lines of context, they get lost in the woods. If you give them 10 lines of code and 5 lines of a stack trace, they become surgical.

The "Permission to Fail" Mindset

A lot of developers get frustrated because the AI doesn't write perfect code on the first try. My secret? I expect it to fail.

I’ll take the first response the AI gives me, even if I suspect it’s slightly off, and I’ll run it against the tests. If it fails, I don't argue with the AI. I don't say, "You forgot the semicolon." I just copy the *new* error message and throw it back.

"Still failing. Here's the new error: IndexError: list index out of range."

This creates a tight loop.
1. Human: Sets the goal (The Test).
2. AI: Attempts the solution.
3. Compiler/Test Runner: Provides the feedback.

The "Bridge" is the error message. It’s the objective truth that exists between your intent and the AI’s hallucination.

A Gotcha: The Green Bar Trap

There is a danger here. If you’re not careful, the AI will "cheat." It might hardcode a return value just to make your specific test pass, like a cheeky student trying to get out of class early.

# The AI's "lazy" fix
def extract_api_key(text):
    if "secret_123" in text:
        return "secret_123"
    return None

This is why the bridge needs to be narrow, but the tests need to be robust. If the AI "cheats," you don't fix the code—you write another test that proves the cheat is invalid.

I recently had an AI try to parse JSON with regex (don't ask). Instead of lecturing it on why that’s a sin, I just wrote a test case with nested objects that I knew regex would choke on. I ran the test, it failed, and I handed the failure back to the AI. Only then did it "realize" it should probably just use json.loads().

Stop Talking, Start Testing

The next time you're staring at a blank prompt, try to stop describing what you want. Write a failing test instead. It forces you to actually think through the requirements, and it gives the AI a target it can actually hit.

The goal isn't to get the AI to write the feature. The goal is to get the test runner to show a green bar. The AI is just the quickest path across that bridge.