How to Get Guaranteed JSON From Your LLM Without Recursive Retry Logic

I spent three hours last Tuesday debugging a production edge case where GPT-4o decided, for no apparent reason, to prefix its JSON response with "Certainly! Here is your data:". My parser choked, the UI blanked, and I ended up writing a regex-filled recursive retry loop that felt more like a prayer than a piece of engineering.

We’ve all been there—begging the LLM to "only return JSON" in the system prompt, only to have it hallucinate a trailing comma or a markdown code block that ruins your day. But the era of the "Retry Loop" is finally dying. We don't have to ask nicely anymore; we can actually enforce the schema at the engine level.

The "Please, I'm Begging You" Era

Before we look at the fix, let's acknowledge how fragile the old way was. Usually, it looked something like this:

async function getAnalysis(text: string, attempt = 0): Promise<Data> {
  try {
    const res = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: `Return JSON for: ${text}` }]
    });
    
    // The "Hold your breath" moment
    return JSON.parse(res.choices[0].message.content); 
  } catch (e) {
    if (attempt < 3) return getAnalysis(text, attempt + 1);
    throw new Error("AI is being stubborn today.");
  }
}

This is terrible. It’s non-deterministic, it wastes tokens, and it adds latency. If the model fails three times, your user gets a spinner that lasts forever and then an error toast.

Enter Structured Outputs

Modern LLM providers (OpenAI, Anthropic, and local engines like vLLM or Ollama) now support Grammar-Constrained Sampling. Instead of the model choosing any token it wants, the engine "masks" the possible tokens to only those that would be valid according to a specific JSON schema.

If the schema says the next character must be a quote or a closing bracket, the model literally *cannot* pick a letter. It’s not just prompt engineering; it’s a hard constraint on the math.

The TypeScript Way (with OpenAI)

The cleanest way to do this now is by using the zod-to-json-schema approach or the native response_format in the OpenAI SDK. Here is how you get a guaranteed object that matches your interface every single time:

import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
import OpenAI from "openai";

const openai = new OpenAI();

// 1. Define your interface using Zod
const ResearchPaperSchema = z.object({
  title: z.string(),
  authors: z.array(z.string()),
  summary: z.string(),
  impact_score: z.number().min(1).max(10),
});

async function analyzePaper(abstract: string) {
  const completion = await openai.beta.chat.completions.parse({
    model: "gpt-4o-2024-08-06",
    messages: [
      { role: "system", content: "Extract paper metadata." },
      { role: "user", content: abstract },
    ],
    // 2. This is the magic part
    response_format: zodResponseFormat(ResearchPaperSchema, "paper"),
  });

  // 3. No more JSON.parse()! It's already parsed and typed.
  const paper = completion.choices[0].message.parsed;
  
  if (!paper) throw new Error("Something went wrong");
  
  console.log(paper.title); // TypeScript knows this is a string
  return paper;
}

Why this is a massive win

The most obvious benefit is that you can delete your retry logic. But there are deeper reasons to love this:

1. Zero Hallucinated Keys: The model won't decide to rename impact_score to rating just because it feels like it.
2. No Markdown Noise: You’ll never get json ` blocks again. The engine skips the formatting and goes straight to the data.
3. Type Safety: Because we're using Zod (or similar), the data coming out of the API call is automatically validated against our TypeScript types. If it's in your code, it's in the response.

The Gotchas (Because there are always gotchas)

It's not all sunshine and rainbows. There are a few things that tripped me up when I first switched:

* Strictness: To get the guarantee, you often have to set strict: true. This means you can't have optional fields in the way you might be used to; you usually have to explicitly allow null or provide a default.
* Latency: There is a tiny bit of "pre-processing" time when the model first sees a new schema. It’s negligible for small schemas, but if you're sending a 500-line JSON schema, the first token might take an extra second to show up.
* Model Support: Not every model supports "Strict" mode. If you’re using older versions of GPT-3.5 or some open-source models via generic wrappers, you might still be stuck with the "begging" method.

Stop Parsing, Start Constraining

If you’re still writing code to "clean up" LLM responses by stripping out backticks or searching for the first { character, stop.

Move your logic into a schema. Let the engine handle the constraints. Your codebase will be smaller, your types will be real, and you might actually get to sleep through the night without a "Unexpected token 'C' in JSON" alert waking you up.