loke.dev
Header image for Stop Measuring Query Latency: Why the BUFFERS Flag Is the Only Real Way to Audit Postgres Performance

Stop Measuring Query Latency: Why the BUFFERS Flag Is the Only Real Way to Audit Postgres Performance

Discover why execution time is a misleading metric in local development and how the BUFFERS flag reveals the true shared-buffer pressure of your production queries.

· 5 min read

I’ve spent way too many hours staring at a local Postgres instance, hitting the "execute" button five times in a row, then averaging the time in my head like some sort of human-benchmark hybrid. It feels productive, but it’s actually a complete waste of time. Your local machine is a liar—it has a faster NVMe drive than your production server, different RAM constraints, and absolutely zero "noisy neighbor" traffic.

If you are optimizing a query based on the millisecond count in your dev environment, you are optimizing for a ghost.

To actually understand if a query is efficient, you have to look at the work being done, not how long the work took. In Postgres, work is measured in 8KB blocks. If you want the truth, you need the BUFFERS flag.

The Latency Trap

Latency is a "side effect" of performance, not a measure of it.

Imagine you have a query that takes 10ms on your laptop. You push it to production, and suddenly it’s taking 200ms. What happened? Did the query change? No. The cache changed. On your laptop, the data was already in memory. On production, under heavy load, Postgres had to go to the disk.

When we use EXPLAIN ANALYZE, we get the execution time. But if we add BUFFERS, we get the receipt.

-- The way most people do it (The Lie)
EXPLAIN ANALYZE 
SELECT * FROM orders WHERE customer_id = 420;

-- The way you should do it (The Truth)
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM orders WHERE customer_id = 420;

Reading the "Receipt"

When you run that second query, Postgres spits out a block that looks like this:

Buffers: shared hit=423 read=15 dirtied=2

This is the most important line in your entire query plan. Here is the breakdown:

* shared hit: The data was already in the Postgres buffer cache (RAM). This is fast.
* shared read: The data was not in the cache; Postgres had to fetch it from the OS (which might go to disk). This is slow.
* dirtied: This query changed data that now needs to be written to disk. This costs IO.

Why does this matter more than time? Because the number of buffers hit is deterministic. If your query hits 5,000 buffers on your laptop, it’s going to hit roughly 5,000 buffers in production. The *time* it takes to hit those buffers will vary wildly, but the *work* stays the same.

Case Study: The "Fast" Sequential Scan

I once saw a query that looked perfectly fine. It was scanning a table of 50,000 rows and took about 5ms. The developer said, "It's fast enough, we don't need an index."

But look at the BUFFERS output:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users WHERE email = 'test@example.com';

-- Output snippet:
-- Seq Scan on users  (cost=0.00..1205.00 rows=1 width=155) (actual time=4.8..5.1 rows=1 loops=1)
--   Filter: (email = 'test@example.com'::text)
--   Rows Removed by Filter: 49999
--   Buffers: shared hit=890

shared hit=890. Every time this query runs, it’s touching 890 pages of memory. That’s ~7MB of data being cycled through the cache for a single row.

Now, imagine 1,000 users hit that endpoint simultaneously. You’re now asking Postgres to churn through 7GB of memory traffic just to find 1,000 emails. Your "fast" 5ms query just choked your production database because you ran out of cache space, forcing other queries to go to disk.

After adding an index:

-- After adding index on email
-- Index Scan using idx_users_email on users ...
--   Buffers: shared hit=3

From 890 buffers to 3. That is a 296x reduction in "work," even if the execution time only dropped by 4ms. That’s how you scale.

The Cold Cache Problem

The first time you run a query, it's usually slow. The second time, it's fast. This is because of "warm" vs "cold" caches.

If you only look at latency, you might think you fixed the query, but you just got lucky. BUFFERS shows you the shared read count. If read is high, you're relying on disk. If hit is high, you're relying on RAM.

When auditing, you should be trying to minimize the *total sum* of (hit + read). If that number is high, your query is "heavy," regardless of how fast it feels.

How to use this in your workflow

Stop looking at the clock. Start looking at the page count.

1. Run your query with EXPLAIN (ANALYZE, BUFFERS).
2. Ignore the execution time for a moment.
3. Look at the total Buffers. Is it in the hundreds? Fine. Is it in the tens of thousands? You have a problem.
4. Look for `shared read`. If this is high, your production environment is going to feel the pain of disk I/O latency.
5. Refactor or Index until that buffer count drops.

I’ve found that aimlessly adding indexes is a great way to bloat your database. But using BUFFERS gives you a surgical way to see exactly where the pressure is coming from. It’s the difference between guessing why your car is slow and actually opening the hood to see the smoke.

Next time someone tells you a query is "fast," ask them how many buffers it touched. If they don't know, they're just guessing.