Skip to content

Indexes Overview

A map of every index-like structure Rayforce ships — per-column accelerators, vector ANN indexes, linked columns, partition pruning, and graph indices. One mental model, then a decision matrix that points you at the right tool.

What an "index" means in Rayforce

An index in Rayforce is a precomputed, optional structure that rides alongside the data it's built for. It is not a separate database object: it lives on the column or table it indexes, survives copy / refcount semantics, and travels with the data through the query pipeline. Whether queries actually consult that structure varies by kind — HNSW, linked columns, partition pruning, CSR, and the four .idx.* accelerators are all read by their respective query paths. See the status section below.

Three properties hold for every kind of index documented on this page:

  • Opt-in. Indexes are built explicitly when you decide the build cost is worth it. The system never builds one behind your back.
  • Mutation-aware. Mutating the underlying data drops or invalidates the index by design — a stale index is a wrong-answer bug, so the runtime refuses to keep one. Rebuild after a write.
  • Transient by default. Per-column accelerators live in memory only; HNSW handles can be persisted to disk explicitly. The on-disk file format for ordinary tables never carries an index. After loading, rebuild whatever you want indexed.

The five index-like structures

1. Per-column accelerators — .idx.zone / .idx.hash / .idx.sort / .idx.bloom

Attach one of four kinds to a numeric vector. Each kind builds a structure suited to a different query shape: hash for equality lookups, sort for binary search and ordered access, zone for column-level min/max/null pruning, bloom for cheap probabilistic membership rejection. All four occupy the same per-column slot — one kind at a time today.

Today's status: all four are built, inspectable via (.idx.info), and consulted by the executor at six routing sites — filter comparisons, filter IN, ORDER BY (single ascending key), distinct, and find. The routing is transparent: a query against an indexed column uses the fast path; the same query without the index falls back to the linear scan and returns identical results.

Surface: (.idx.zone v), (.idx.hash v), (.idx.sort v), (.idx.bloom v), (.idx.drop v), (.idx.has? v), (.idx.info v). Numeric only in v1 (RAY_BOOL through RAY_TIMESTAMP at the C level; integer / float / date / time / timestamp vectors are the practical reach from Rayfall); RAY_SYM / RAY_STR are deferred.

See: Accelerator Indexes (reference) · Indexes Guide: choosing a kind.

2. Vector ANN index — HNSW

Hierarchical Navigable Small World multi-layer proximity graph for approximate nearest neighbor search over float embedding vectors. Three distance metrics — cosine, L2, inner product. Built once with hnsw-build, queried with ann, optionally persisted to a directory with hnsw-save / hnsw-load.

Surface: (hnsw-build col [metric] [M] [ef_c]), (ann handle query k [ef_search]), (knn col query k [metric]), (hnsw-save handle dir), (hnsw-load dir), (hnsw-free handle), (hnsw-info handle). Brute-force knn needs no index and exists alongside.

See: Vector Search & HNSW · Indexes Guide: ANN workflow.

3. Linked columns

A column whose values are row-id references into another table. Functions as a row-level index: dereferencing follows the link and resolves the target row at query time, similar in spirit to a foreign-key relationship but maintained at the column level.

Surface: (.col.link col target-table), (.col.unlink col), (.col.link? col), (.col.target col).

Parted-table interaction: a parted fact can carry a linked column targeting a non-parted dim (in-memory or splayed); per-segment HAS_LINK is preserved through ray_read_parted and segment streaming. Targets with any parted column are rejected at attach time. See Linked Columns: Parted-Table Interaction.

See: Linked Columns.

4. Partition pruning

A storage layout, not a column-level index, but it functions as a coarse zone-map at the table level: the partition discriminator (date, integer, or symbol) selects whole sub-tables to load. Filters that target the partition column let the optimizer skip entire partitions before any scan begins.

Surface: implicit — the directory layout under your database root drives partition selection. The C API loader (ray_part_load) infers the partition type (date / int / sym) from the directory names.

See: Columnar Storage · Storage Guide: partitioned tables · Block Offloading.

5. CSR graph index

A double-indexed Compressed Sparse Row adjacency structure (forward + reverse) attached to graph relationships. Used transparently by every graph opcode — OP_EXPAND, OP_VAR_EXPAND, OP_SHORTEST_PATH, OP_WCO_JOIN — and by Leapfrog Triejoin for worst-case optimal joins.

Surface: none directly — the CSR is built when a relationship is loaded and consulted automatically by graph queries. There is no (.csr.*) Rayfall surface today.

See: Graph Storage · Graph Algorithms.

Pick the right kind

Match the shape of your query to the structure that fits it.

Want to… Structure Active today?
Skip whole columns where a predicate constant lies outside the value range .idx.zone — min/max plus null count Yes — O(1) all/none short-circuit at the filter site (int + float)
Make repeated = / in / find over a numeric column O(matches) instead of O(n) .idx.hash — chained open-addressing table Yes — hash probe at filter EQ, filter IN, and find sites
Binary-search a numeric column for range predicates or reuse the permutation for ORDER BY / distinct .idx.sort — ascending row-id permutation Yes — filter range (EQ/LT/LE/GT/GE), single-key ASC ORDER BY, distinct
Cheaply reject "definitely not in this set" probes for integer EQ filters .idx.bloom — m-bit probabilistic filter Yes — definite-absent proof at the filter EQ site (integer-family only)
Find the k nearest neighbors of an embedding vector by cosine, L2, or inner product HNSW — (hnsw-build) + (ann) Yes — (ann) consults the index
Resolve a cross-table reference at query time without a materialized join Linked column — (.col.link) Yes — column dereference resolves through the link
Skip whole sub-tables in a parted dataset based on the partition discriminator Partition pruning — date / int / sym partitioning Yes — optimizer pass rewrites filters
Traverse a graph — BFS, shortest path, betweenness, MST CSR — transparent under graph opcodes Yes — every graph opcode reads CSR directly

What's wired today

Rayforce is honest about phasing. The structures above all build correctly; integration depth varies by kind.

  • Per-column accelerators — build kernels, the (.idx.*) surface, and executor routing are all shipped and tested. The executor consults indexes at six sites (filter comparisons, filter IN, ORDER BY, distinct, find); every fast path falls back to the scan on any miss and returns identical results. See Routing Table for the full matrix and eligibility conditions. Set RAY_IDX_STATS=1 to observe consult/hit counts at exit.
  • HNSW — fully wired. (ann) consults the index immediately.
  • Linked columns — fully wired. Dereference resolves through the link at query time.
  • Partition pruning — the optimizer's pruning pass skips partitions whose discriminator falls outside a filter predicate's range. See Pipeline & Optimizer.
  • CSR — fully wired and used by every graph opcode.

Where to go next

  • Indexes Guide — procedural walk-through: when to build, how to choose, common workflows, lifecycle gotchas.
  • Accelerator Indexes — full reference for the .idx.* family, including the routing table, eligibility conditions, and observability.
  • Vector Search & HNSW — ANN reference and worked examples.
  • Linked Columns — cross-table reference reference.
  • Columnar Storage — partitioning and on-disk layout.