The C++ memory model and CAS loops

The C++ Memory Model

Goal: understand "it works on my machine" vs. "correct in theory" (provably correct)

No standard concurrency model. pthread/WinAPI were de facto standards, but not part of C++.
Compiler/optimizations could break your assumptions: Sharing objects across threads was unconditionally UB. Reordering, caching, dead store elimination.
Portable code was hard: Code working on x86 might crash on ARM due to different memory ordering.

formal, portable concurrency model in the language standard.
Defined exactly what the compiler/CPU can and cannot do with memory operations across threads.
Made multi-threaded C++ first-class, not just a wrapper around OS APIs.
Your code is only "correct" if it conforms to the memory model. If it works without conforming you're relying on stronger hardware guarantees.

Takeaway: The memory model constrains the implementation. This allows C++ users to reason about concurrent behavior.

The memory model is the foundation that types like std::thread, std::mutex, or std::atomic rely on.

Important concepts:

threads of Execution
Reordering & Visibility
- by the compiler
- by the CPU/GPU (also cache coherence protocol)
sequenced before
synchronizes with
happens before
forward progress (consider what a program requires to not be useless)

Takeaway: Don't introduce a data race. 😀

volatile is not a tool relevant to the happens before relation. It's a tool for I/O.

Edited Nov 26, 2025 by Matthias Kretz