Stage 3: SSTables -- Flushing to Disk

stage 3 of 4 · ~30 min · runs in your browser

Objective:Flush the memtable to sorted, immutable on-disk segments and layer reads across them.

SSTable (Sorted String Table)Immutable on-disk segmentsMulti-layer read path (memtable -> segments, newest first)Tombstone markers for deletesRead amplificationWrite O(1) vs read O(N segments)

A memtable can't grow forever. When it fills, LSM engines flush it to disk as an SSTable -- an immutable, sorted run of key-value pairs. Reads then check the memtable first, then SSTables newest-to-oldest: the most recent write always wins.

Extend your store with:

store.flush() -- moves the current memtable into store.segments as { entries: [[key, value], ...] } sorted by key, then empties the memtable. Deletes flush too, as [key, null] tombstones (you need them later -- an older segment may still hold the key).
get(key) -- memtable first; on miss, scan segments from newest to oldest; a tombstone (null) means "deleted", stop looking.
store.segments -- newest segment last in the array.

Why sorted? Binary search. An SSTable in production is a file on disk; finding a key in O(log n) requires the file to be sorted. You'll use a sorted entries array to model this. In LevelDB/RocksDB, the SSTable also has a sparse index and Bloom filter at the end -- keys pointing to blocks -- but the sort is the primitive.

Why immutable? Two reasons: (1) sequential writes are fast, in-place updates are not; (2) multiple readers can read the same SSTable file concurrently without locks. Immutability is a core correctness and performance property.

Why this matters in production

The memtable can't grow forever -- it's RAM. When it fills, LSM engines flush it to an SSTable (Sorted String Table): a sorted, immutable file on disk. From this point forward, reads must check ALL layers: memtable first (RAM, newest), then segments from newest to oldest. This multi-layer read path is the defining characteristic of LSM engines and the root cause of both their read amplification problem and their write advantage. After Stage 3 your engine has the same fundamental architecture as LevelDB's L0 flush -- you're missing only the index (Bloom filter + sparse index) that LevelDB adds to avoid scanning every SSTable byte by byte.

tests (6)

○ flush produces a sorted immutable segment

○ flush clears the memtable

○ reads fall through to segments after flush

○ the memtable beats older segments

○ newer segments beat older segments

○ tombstones flush and hide older values

createStore.js