loke.dev

The 'Smart' Search Struggle: Implementing Vector Embeddings in Remix Without the Math Degree

Keyword search is so 2010. I explored how to leverage vector embeddings and pgvector to give my Remix app a brain that actually understands what users are looking for.

· 5 min read

Let’s be honest: standard SQL LIKE queries are the digital equivalent of a toddler trying to fit a square peg in a round hole.

We’ve all been there. You build a beautiful Remix app, you’ve got your loaders and actions humming along, and then a user types "comfortable walking shoes" into your search bar. Your database—bless its heart—looks at the table and says, "Sorry, I only have 'ergonomic sneakers.' I have no idea what 'comfortable' or 'walking' means in this context."

It’s frustrating. It feels primitive. For years, the answer was "Just use Elasticsearch," which is basically like saying "Oh, you want to hang a picture? Why not build an entire house first?"

But things have changed. We live in the era of the LLM, and it turns out we can borrow some of that AI magic to make our search bars actually... smart. And the best part? You don't need a PhD in Linear Algebra to do it.

The "What is a Vector?" Talk (The 60-Second Version)

Before we dive into the code, we need to understand what we're actually doing. Normally, we store text as strings. With vector embeddings, we store text as a long array of numbers (a vector).

Imagine a giant 3D map. On this map, the word "Apple" sits near "Fruit" and "iPhone," but nowhere near "Carburetor." When a user searches for something, we turn their query into coordinates on that same map and find the closest items. That’s "Semantic Search."

In a Remix app, this usually involves three players:

  1. OpenAI (or similar): To turn your text into those big arrays of numbers.
  2. PostgreSQL with `pgvector`: To store those numbers and let us run "distance" math on them.
  3. Remix: To glue the UI to the logic without losing our minds.

Setting the Stage: Postgres and pgvector

If you're using Supabase, Neon, or even a local Docker Postgres instance, you probably already have access to pgvector. It’s the industry standard for this stuff now.

First, you’ll need to enable the extension. Run this in your migration or your DB console:

CREATE EXTENSION IF NOT EXISTS vector;

Now, let’s say we’re building a blog or a knowledge base. Our schema (using Prisma, because I’m a fan of type safety) might look like this:

// schema.prisma
model Article {
  id        String   @id @default(cuid())
  title     String
  content   String
  embedding Unsupported("vector(1536)")? // 1536 is the magic number for OpenAI
}

_Note: Prisma’s support for vector types is still a bit "hand-shakey." You might need to use Unsupported("vector(1536)") and execute raw SQL for the search part, but it works._

Step 1: Generating the Embeddings

You can’t search by vectors if you haven't converted your data yet. I usually handle this in a background job or right when the record is created.

Here’s a simple utility function I wrote to talk to OpenAI's text-embedding-3-small model (which is cheaper and better than the old ada-002).

import { OpenAI } from 'openai'

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })

export async function generateEmbedding(text: string) {
  // We want to clean the text a bit, remove weird newlines
  const input = text.replace(/\n/g, ' ')

  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input,
  })

  return response.data[0].embedding
}

Step 2: The Search Loader

This is where Remix shines. We want a search page where the URL keeps track of the query. No complex state management, just a good old loader.

Here’s the thing about vector search: you can't just do db.article.findMany(). You have to find the "cosine similarity" between the user's query and your stored vectors.

<Callout variant="info"> Pro Tip: Always use text-embedding-3-small or large now. They're significantly more efficient than older models and let you truncate dimensions if you're worried about storage. </Callout>

In your Remix route (e.g., app/routes/search.tsx):

import { json, type LoaderFunctionArgs } from '@remix-run/node'
import { Form, useLoaderData } from '@remix-run/react'
import { generateEmbedding } from '~/utils/ai.server'
import { db } from '~/utils/db.server'

export const loader = async ({ request }: LoaderFunctionArgs) => {
  const url = new URL(request.url)
  const query = url.searchParams.get('q')

  if (!query) return json({ results: [] })

  // 1. Turn the user's query into a vector
  const queryEmbedding = await generateEmbedding(query)

  // 2. Query Postgres using a raw SQL template for cosine similarity
  // The <=> operator is the "cosine distance" operator in pgvector
  const results = await db.$queryRaw`
    SELECT id, title, content,
    1 - (embedding <=> ${JSON.stringify(queryEmbedding)}::vector) as similarity
    FROM "Article"
    WHERE 1 - (embedding <=> ${JSON.stringify(queryEmbedding)}::vector) > 0.7
    ORDER BY similarity DESC
    LIMIT 5;
  `

  return json({ results })
}

export default function SearchPage() {
  const { results } = useLoaderData<typeof loader>()

  return (
    <div className="p-8 max-w-2xl mx-auto">
      <Form method="get" className="mb-8">
        <input
          type="text"
          name="q"
          placeholder="Search for something abstract..."
          className="border p-2 w-full rounded shadow-sm"
        />
      </Form>

      <div className="space-y-4">
        {results.map((result: any) => (
          <div
            key={result.id}
            className="p-4 border rounded hover:bg-gray-50 transition"
          >
            <h2 className="text-xl font-bold">{result.title}</h2>
            <p className="text-sm text-gray-600">
              Match score: {Math.round(result.similarity * 100)}%
            </p>
          </div>
        ))}
      </div>
    </div>
  )
}

So, Why Is This Better?

The magic here is in the "Search for something abstract" part.

If I have an article about "React performance optimization," and a user searches for "Making my website faster," a keyword search would likely fail. But the embedding for "performance optimization" and "making my website faster" are mathematically very close.

Suddenly, your app feels like it has a brain. It _understands_ intent.

The "Gotchas" (Because it’s never as easy as the tutorial)

Now, I’d be lying if I said this was all sunshine and rainbows. Here are a few things that tripped me up:

  1. The Cold Start Latency: Generating an embedding for the query takes a trip to OpenAI. This adds 200-500ms to your search. It’s not a dealbreaker, but you'll want a nice loading spinner in your Remix UI so the user doesn't think the app died.
  2. Indexing: Once you get past a few thousand rows, pgvector queries get slow. You’ll need to add an HNSW (Hierarchical Navigable Small World) index. It sounds like a sci-fi gadget, but it's just a way to make the math faster.
  3. Cost: OpenAI charges per token. It's pennies for a small app, but if you're re-indexing your entire database every time someone fixes a typo, it adds up. I suggest only updating the embedding when the _actual_ core content changes.

Wrapping Up

Implementing vector search in Remix isn't the daunting task it was two years ago. We don't need to build custom Python microservices or maintain massive clusters. With pgvector and a simple API call, we can give our users an experience that actually feels modern.

The struggle isn't the math—it's just knowing which tools to plug together. Remix gives us the perfect scaffold to handle these async operations gracefully, keeping our UI snappy while the heavy lifting happens on the server.

Now, if you'll excuse me, I have to go figure out why my "Smart Search" keeps suggesting pancake recipes when I search for "Flattish UI components." (Actually, that might just be a feature).

Happy coding!