loke.dev
Header image for A Quiet Collision in the Memory Model

A Quiet Collision in the Memory Model

Explaining why your lock-free concurrent logic fails on ARM but runs perfectly on x86, and how the Atomics API enforces the sequence.

· 8 min read

Most of us write JavaScript under the comforting illusion that code executes exactly in the order it appears on the screen. When we move into the world of SharedArrayBuffer and multi-threaded Workers, that illusion doesn't just crack—it shatters specifically on hardware we previously thought was "just another target."

If you’ve ever built a high-performance, lock-free data structure in Node.js or the browser, you might have noticed something haunting. Your code passes every test on your Intel-based CI server. It runs flawlessly on your developer workstation. But the moment it hits an Apple M-series chip or a Graviton instance on AWS, the logic collapses into a mess of corrupted state and impossible values.

This isn't a bug in the V8 engine, and it isn't a flaw in your syntax. You’ve just hit a collision between the x86 and ARM memory models.

The Mirage of Sequentiality

In a single-threaded environment, the engine can reorder your instructions however it wants, provided the *observable result* is the same. If you assign a = 1 and then b = 2, and b doesn't depend on a, the CPU might decide to flip them. You’ll never know because you’re the only one watching.

In a multi-threaded environment using SharedArrayBuffer, you are no longer alone. Another thread is watching.

Let's look at a classic "Flag" pattern. This is the simplest form of message passing. Thread A writes some data, then sets a flag. Thread B waits for the flag, then reads the data.

// Thread A (Producer)
sharedView[0] = 42;    // The Data
sharedView[1] = 1;     // The Flag (Ready!)

// Thread B (Consumer)
while (sharedView[1] === 0) {
  // Spin-wait...
}
console.log(sharedView[0]); // Should be 42, right?

On an x86 (Intel/AMD) processor, this almost always works. On an ARM processor, sharedView[0] will occasionally be 0 (or whatever its initial value was) even though sharedView[1] is 1.

How is that possible? To understand the "why," we have to stop looking at our code and start looking at the silicon.

The Intel Tax and the ARM Ambush

The reason your code works on x86 is due to something called Total Store Order (TSO). Intel processors are incredibly "conservative" about reordering. Specifically, they generally do not reorder a store with another store. If Thread A writes to index 0 and then index 1, the rest of the system sees them in that order.

ARM (and PowerPC, and RISC-V) is Weakly Ordered.

An ARM chip is more aggressive. It looks at the two writes in Thread A and thinks: *"Writing to index 1 is faster right now because that cache line is already available. I'll do that first, and then I'll finish writing to index 0."*

To the ARM chip, this is an optimization. To your lock-free logic, this is a catastrophe. Thread B sees the flag set to 1, breaks out of its loop, reads the data at index 0, and gets the old value because the CPU hasn't actually finished the write yet.

The Atomics API is Not Just About "Atomicity"

When developers see the Atomics object in JavaScript, they usually think of it as a way to prevent "torn writes"—ensuring that a 32-bit integer is written in one piece rather than two 16-bit chunks. While true, that is actually the least interesting thing Atomics does.

The real power of Atomics.load() and Atomics.store() lies in Memory Fences (or barriers).

When you use Atomics.store(), you aren't just writing a value; you are telling the hardware: *"Every write I performed before this must be globally visible before this value itself becomes visible."*

Let’s rewrite our failing example using the Atomics API.

// Thread A (Producer)
const buffer = new SharedArrayBuffer(1024);
const view = new Int32Array(buffer);

// Regular assignment is "relaxed" - the CPU can reorder this
view[0] = 42; 

// Atomics.store acts as a "Release" fence
Atomics.store(view, 1, 1); 

// Thread B (Consumer)
while (Atomics.load(view, 1) === 0) {
    // Atomics.load acts as an "Acquire" fence
}

// Because of the fences, view[0] is guaranteed to be 42 here
console.log(view[0]); 

Understanding Acquire and Release Semantics

The Atomics API implements a memory model known as Sequential Consistency for Race-Free programs. In simpler terms, it uses "Acquire/Release" semantics:

1. Atomics.store (Release): Ensures that all previous writes (even non-atomic ones) stay *above* the store. They cannot be reordered to happen after the atomic store.
2. Atomics.load (Acquire): Ensures that all subsequent reads stay *below* the load. The CPU cannot speculatively read view[0] before it has finished checking the flag in view[1].

Without these fences, the ARM chip is free to speculatively read view[0] before the loop even finishes, or the producer is free to publish the flag before the data.

A Practical Disaster: The Circular Buffer

Let’s look at a more complex example. Imagine a circular buffer (a Ring Buffer) used to pass messages between a main thread and a worker. We have a head and a tail index.

// A naive, broken Ring Buffer write
function tryPush(value) {
  const head = view[HEAD_INDEX];
  const tail = view[TAIL_INDEX];
  
  if ((head + 1) % SIZE === tail) return false; // Full
  
  view[head] = value;               // Write Data
  view[HEAD_INDEX] = (head + 1) % SIZE; // Move Head
  
  return true;
}

On x86, this might survive for weeks in production. On an M2 Max, this will fail under high load within minutes. The consumer will see the head move, read the data at the new index, and receive garbage because the view[head] = value write hasn't propagated yet.

The fix requires us to be intentional about our barriers:

function tryPush(value) {
  // We use Atomics.load for the indices to ensure we see the latest
  const head = Atomics.load(view, HEAD_INDEX);
  const tail = Atomics.load(view, TAIL_INDEX);
  
  if ((head + 1) % SIZE === tail) return false;
  
  // 1. Write the data (relaxed is fine here because...)
  view[head] = value;
  
  // 2. (...the Atomics.store here creates a 'Release' fence)
  // This guarantees the 'value' write is visible before the 'head' update.
  Atomics.store(view, HEAD_INDEX, (head + 1) % SIZE);
  
  return true;
}

Why Not Use Atomics for Everything?

You might be tempted to just wrap every single array access in Atomics.load or Atomics.store. While this would make your code "safe," it would also make it slow.

Atomic operations are expensive. They aren't just regular memory accesses; they often require the CPU to flush store buffers and coordinate with other cores' caches.

- Regular Access: Extremely fast, uses the CPU cache effectively, but subject to reordering.
- Atomic Access: Slower, enforces ordering, ensures cache coherency across cores.

The art of lock-free programming in JavaScript is using Atomics only at the synchronization points. You use them for flags, counters, and pointers. Once you've used an atomic operation to "acquire" a piece of memory, you can often read the actual data within that memory using regular, fast array access, provided your protocol guarantees no other thread is writing to it.

The "Happens-Before" Relationship

To debug these issues, you need to think in terms of the "Happens-Before" relationship defined in the ECMAScript memory model.

If event $A$ *happens-before* event $B$, then $B$ must see the effects of $A$.

In our flag example:
1. view[0] = 42 happens-before Atomics.store(view, 1, 1).
2. Atomics.store(view, 1, 1) in Thread A happens-before Atomics.load(view, 1) in Thread B returning 1.
3. Atomics.load(view, 1) returning 1 happens-before console.log(view[0]).

Because the "happens-before" relationship is transitive, the write to view[0] is guaranteed to be visible to the console.log. If you replace Atomics.store with a plain view[1] = 1, the chain is broken. There is no longer a guaranteed relationship between the write in Thread A and the read in Thread B on weakly ordered hardware.

The Sneaky Case of Atomics.wait and notify

Wait/Notify are the heavy hitters of the Atomics API. They allow a thread to go to sleep and be woken up by another thread.

// Thread B (Waiting)
const result = Atomics.wait(view, 1, 0); // Wait if view[1] is 0
if (result === 'ok') {
    console.log(view[0]);
}

// Thread A (Waking)
view[0] = 99;
Atomics.store(view, 1, 1);
Atomics.notify(view, 1, 1);

Crucially, Atomics.notify and Atomics.wait also act as full memory barriers. However, a common mistake is updating the data *after* calling Atomics.store but *before* calling Atomics.notify.

Always remember: The signal must come after the payload.

Testing for the Impossible

One of the hardest parts of this "Quiet Collision" is that it's nearly impossible to test for on a standard x86 CI runner (like GitHub Actions' default runners). Your tests will pass 100% of the time.

To truly vet lock-free JS code, you have three options:
1. ARM-specific CI: Use AWS Graviton or Oracle Cloud ARM instances for your test runners.
2. Stress Testing: Write tests that loop millions of times, specifically designed to catch reordering.
3. Formal Verification: (A bit extreme for most) Using tools to model your state machine and ensure no invalid states exist.

For most of us, simply following the rule of "Always use Atomics for shared state control" is enough.

Summary of the Rules

If you are working with SharedArrayBuffer, keep these laws in mind:

1. Don't trust x86. Just because it works on your Intel Mac doesn't mean your logic is sound.
2. Identify your Synchronization Points. Any variable that signals the status of *other* memory must be accessed via Atomics.
3. Respect the Fence. Use Atomics.store to "publish" data and Atomics.load to "consume" data.
4. Avoid Read-Modify-Write Races. If you need to increment a counter, don't do view[0]++. Use Atomics.add(view, 0, 1).

The memory model is a contract. On x86, the hardware gives you more than you asked for. On ARM, the hardware gives you exactly what is in the contract and not a bit more. By using the Atomics API correctly, you ensure your JavaScript is robust enough to survive the move from the world of Intel to the future of ARM.

Lock-free programming is a high-wire act. Atomics is your safety net. Don't try to perform without it.