$ memista

HNSW vs flat: choosing for embeddable workloads

2026-05-21 · ann · hnsw · indexing

There are exactly two vector indexes anyone running on a single machine needs to think about first: a flat (brute-force) index and HNSW. Everything else — IVF, ScaNN, DiskANN, product quantization — is a tuning knob on one of those two ideas. For embeddable workloads, this is the choice that matters, and the answer is rarely a clean “use HNSW.”

What flat actually is

A flat index is a list of vectors. To query, you compute the distance from your query vector to every vector in the list, sort by distance, and return the top-k. That is the entire algorithm. There is no graph, no clustering, no learned structure, no parameter to tune beyond the distance metric.

This is sometimes called brute-force or exact search. It has three properties that matter:

In modern code a flat search is not as slow as it sounds. With SIMD distance kernels — simsimd on x86 with AVX-512 or on Apple Silicon with NEON — a 384-dim cosine over a hundred thousand vectors takes a few milliseconds on a laptop. The constant factor is small; the slope is the problem.

What HNSW actually is

HNSW (Hierarchical Navigable Small World) is a multi-layer proximity graph. Each vector is a node. Edges connect nearby vectors. Layers thin out as you go up — the top layer is sparse, the bottom is dense. To search, you start at an entry point in the top layer, greedily walk toward your query in each layer, descend, and repeat.

This is approximate. You might miss the true top-k. Recall depends on three parameters:

The three together define a recall/latency/memory triangle. USearch’s defaults — which memista accepts by passing 0 for all three to the IndexOptions constructor — pick reasonable values that bias toward generality rather than any particular workload. If you have a specific recall target, you tune them; if you don’t, you accept what USearch picks.

The honest comparison

flatHNSW
Recall100%~95–99% (tunable)
Query latencyO(N · d)O(log N · d) amortized
MemoryN · d · 4 bytes (f32)flat + graph overhead (~M · N · 8 bytes typical)
Insert latencyO(d) (append)O(M · log N · d) (graph stitching)
Update / deleteeasypossible but involved
Persistencetrivial (an array)a graph (USearch handles this)

For a few thousand vectors, the flat index wins on every axis that matters. You don’t have memory overhead, you don’t have approximation error, you don’t have a build phase, and the query is fast enough.

For a few hundred thousand vectors, HNSW starts to dominate on latency while still costing a manageable amount of memory.

For a few million, HNSW is the only choice unless you are happy with hundreds of milliseconds per query.

The line is data-dependent

There is no universal threshold where flat stops winning. The crossover depends on:

The right move on an embeddable system is to write your retrieval against an interface that admits both, measure with your actual data, and pick.

What memista picks

memista picks HNSW, because it uses USearch and USearch is an HNSW implementation. The metric is MetricKind::IP (inner product); the quantization is ScalarKind::F32; the connectivity and expansion parameters are USearch defaults. SIMD acceleration is on via the simsimd feature.

That choice is reasonable for the project’s stated audience — embed in a Rust binary, ship retrieval as part of an app — but it has implications you should think about:

A small heuristic

For a project starting today, I’d reach for flat when:

I’d reach for HNSW (and therefore something like memista or USearch directly) when:

Between 10k and 50k, the answer is “measure.” On embeddable workloads, the constants matter more than the asymptotics, and the only constants that count are the ones on your hardware with your vectors.

The shape of the question is not “which index is best” but “which index is best for the shape of work I’m doing this quarter.” That shape changes; the right code admits both.

A practical pattern: keep both around

The cheapest way to stay honest about which index you need is to keep both implementations available behind a trait, and select at startup:

trait VectorIndex {
    fn add(&mut self, key: u64, v: &[f32]) -> Result<()>;
    fn search(&self, q: &[f32], k: usize) -> Vec<(u64, f32)>;
    fn len(&self) -> usize;
}

struct FlatIndex { /* Vec<(u64, Vec<f32>)> */ }
struct HnswIndex { /* USearch handle */ }

The cost of carrying both is a few hundred lines and one extra trait boundary. The payoff is that when you migrate, you don’t migrate — you flip a config. It also gives you a built-in recall validator: search the same query through both indexes during development, compare top-k. HNSW recall should track flat closely; if it doesn’t, something is wrong with your parameters or your embeddings.

memista does not do this today. The crate-level interface is “USearch with batteries”; if you need flat, you wrap or fork. For new code, it is worth considering whether the trait boundary lives in your application rather than in the library.

What to measure

If you take only one thing from this post: don’t pick on theory. Build the smallest possible benchmark with your actual vectors:

The answer that comes out of this is hardware-specific, dataset-specific, embedding-model-specific, and query-distribution-specific. None of those are visible from outside. Trust your numbers, not someone else’s slide.

For most embeddable workloads — under a million vectors, single process, modest recall budget — the answer ends up being “HNSW with USearch defaults is fine.” Which is the bet memista is making. Whether that bet pays off for your workload is what the measurement tells you.

← all posts