Skip to content

.idx.* — accelerator indexes

Attaches a secondary index to a column vector. The index is metadata — the underlying data is untouched (and shared via copy-on-write with any other holder of the same vector). The four index kinds cover the common query shapes:

  • zone — min/max + null count per zone. Cheap range-predicate elimination on numeric and temporal columns.
  • hash — open-addressing hash set of keys. O(1) point-lookup acceleration on = predicates.
  • sort — sorted permutation. Enables binary search and order-aware merges.
  • bloom — Bloom filter for set membership. Probabilistic, no false negatives, very compact.

Indexes are dropped automatically when the column is mutated (alter set, update, insert); a failed mutation does not drop the index since the data is unchanged. They survive being wrapped in a .col.link and a link survives an index drop — both metadata layers are independent.

Reference

Function Arity Flags Description
.idx.zone unary Attach a zone (min/max) index.
.idx.hash unary Attach a hash-set index.
.idx.sort unary Attach a sorted-permutation index.
.idx.bloom unary Attach a Bloom filter.
.idx.drop unary Detach whatever index the vector carries.
.idx.has? unary True if the vector carries any index.
.idx.info unary Dict describing the attached index.

.idx.zone

Signature: (.idx.zone v). Returns v with a zone index attached. info keys: kind, length, parent_type, saved_attrs, min, max, n_nulls. For float columns min/max are f64; for integer / temporal columns they are i64.

(set tv  [5 1 9 3 7])
(set tvi (.idx.zone tv))
(.idx.has? tvi)   ;; => true
(.idx.info tvi)   ;; => {kind: 'zone, length: 5, min: 1, max: 9, n_nulls: 0, ...}

.idx.hash

Signature: (.idx.hash v). Returns v with a hash index attached. info keys add capacity (next power of two ≥ length) and n_keys (distinct keys hashed).

.idx.sort

Signature: (.idx.sort v). Returns v with a sorted permutation attached. info keys add perm_len.

.idx.bloom

Signature: (.idx.bloom v). Returns v with a Bloom filter attached. info keys add m_bits (bit-array size, power of two), k (number of hash functions), n_keys (inserted distinct keys). False-positive rate is bounded by the standard Bloom formula for m, k, n.

.idx.drop

Signature: (.idx.drop v). Returns v with no index attached. Identity on an unindexed vector.

(set t (.idx.drop tvi))
(.idx.has? t)   ;; => false

.idx.has?

Signature: (.idx.has? v). Returns a bool. Safe to call on non-vectors (returns false).

(.idx.has? tvi)   ;; => true
(.idx.has? tv)    ;; => false
(.idx.has? 42)    ;; => false

.idx.info

Signature: (.idx.info v). Returns a dict if the vector carries an index, or the null object if not.

(.idx.info (.idx.bloom [1 2 3 4 5]))
;; => {kind: 'bloom, length: 5, m_bits: 64, k: 4, n_keys: 5, ...}

See also