Collections
Vectors, lists, tables, dictionaries, and selection bitmaps — the compound data structures that organize scalar values into queryable datasets.
Vectors
Vectors are the fundamental columnar data structure in Rayforce. A vector is a typed, contiguous array of scalar elements — every element shares the same type. Vectors are the columns inside tables and the operands in every DAG operation.
; I64 vector (integer literals)
ray> [1 2 3 4 5]
[1 2 3 4 5]
; F64 vector (float literals)
ray> [1.0 2.5 3.14]
[1.0 2.5 3.14]
; SYM vector (unquoted identifiers)
ray> [AAPL GOOG MSFT]
[AAPL GOOG MSFT]
; BOOL vector
ray> [true false true]
[true false true]
Morsel Iteration
All vector processing in Rayforce happens in morsels — fixed-size chunks of 1024 elements. The executor never processes an entire column at once. Instead, it iterates morsel by morsel, which keeps data in L1/L2 cache and enables pipeline parallelism.
// C API: morsel iteration
ray_morsel_t m;
ray_morsel_init(&m, vec);
while (ray_morsel_next(&m)) {
// m.base — pointer to element data for this morsel
// m.count — number of elements (up to 1024)
// m.selection — RAY_SEL bitmap (NULL = all pass)
process_morsel(&m);
}
Null Handling
Nulls are tracked via a per-vector bitmap, not sentinel values in the data array. This means operations can use fast SIMD loops on the data and only check the null bitmap when needed.
// C API: null bitmap
ray_vec_set_null(vec, 3, true); // mark index 3 as null
ray_vec_is_null(vec, 3); // returns true
ray_vec_is_null(vec, 0); // returns false
The bitmap is stored inline in the first 16 bytes of the ray_t header for vectors with up to 128 elements. Larger vectors use an external bitmap allocation (flagged with RAY_ATTR_NULLMAP_EXT). The RAY_ATTR_HAS_NULLS flag on the vector indicates whether any nulls exist at all — when clear, the bitmap is never checked.
COW Semantics
Vectors use copy-on-write (COW) reference counting. Multiple consumers can share the same vector via ray_retain(). Mutation goes through ray_cow(), which returns the same pointer if the reference count is 1, or a fresh copy if shared.
// C API: COW pattern
ray_retain(vec); // rc: 1 → 2 (shared)
ray_t* writable = ray_cow(vec);
if (writable != vec) {
// Got a fresh copy — vec is still shared
// Must release writable on error paths
}
// Safe to mutate writable
Vector Operations
| C Function | Description |
|---|---|
ray_vec_new(type, cap) | Allocate an empty vector with capacity |
ray_vec_append(vec, elem) | Append one element (may reallocate) |
ray_vec_set(vec, idx, elem) | Set element at index |
ray_vec_get(vec, idx) | Get pointer to element at index |
ray_vec_slice(vec, off, len) | Zero-copy slice (shares data) |
ray_vec_concat(a, b) | Concatenate two vectors |
ray_vec_from_raw(type, data, n) | Create from existing data array |
ray_str_vec_append(vec, s, len) | Append a string to RAY_STR vector |
ray_str_vec_get(vec, idx, &len) | Get string at index |
Lists
Lists are boxed, heterogeneous containers. Each element is a ray_t* pointer to any Rayforce object. Lists are the backbone of table column storage: a table's columns are held in a list.
; Create a list of mixed vectors
ray> (list [1 2 3] [A B C])
([1 2 3] [A B C])
; Lists can hold any type
ray> (list 42 "hello" [1 2])
(42 "hello" [1 2])
// C API: list operations
ray_t* lst = ray_list_new(4); // initial capacity 4
lst = ray_list_append(lst, vec1); // append element
lst = ray_list_append(lst, vec2);
ray_t* item = ray_list_get(lst, 0); // get by index
lst = ray_list_set(lst, 0, new_item); // replace at index
Lists have type = RAY_LIST (0) and store pointers in their data[] array. The len field tracks the number of elements.
Tables
A table is a collection of named column vectors, all the same length. Tables are the primary data structure for analytical queries — the target of select, update, joins, and aggregations.
; Create a table with column names and data
ray> (set t (table [sym price qty] (list [AAPL GOOG MSFT] [150.0 140.0 380.0] [100 200 50])))
sym price qty
-----------------
AAPL 150.0 100
GOOG 140.0 200
MSFT 380.0 50
; Query with select
ray> (select {from:t where: (> price 145.0)})
sym price qty
-----------------
AAPL 150.0 100
MSFT 380.0 50
Internal Structure
A table is a ray_t with type = RAY_TABLE (98). Internally it contains:
- Schema — an I64 vector of symbol IDs, one per column name
- Columns — a list of typed vectors, one per column
// C API: table construction
ray_t* tbl = ray_table_new(3); // 3 columns
ray_table_add_col(tbl, sym_id, col_vec); // add named column
// Access
ray_t* col = ray_table_get_col(tbl, sym_id); // by name (symbol ID)
ray_t* col = ray_table_get_col_idx(tbl, 0); // by position
int64_t nr = ray_table_nrows(tbl); // row count
int64_t nc = ray_table_ncols(tbl); // column count
ray_t* sch = ray_table_schema(tbl); // I64 vec of col name IDs
Column Name Management
| C Function | Description |
|---|---|
ray_table_col_name(tbl, idx) | Get symbol ID of column at index |
ray_table_set_col_name(tbl, idx, id) | Rename column at index |
ray_table_schema(tbl) | Get the full schema as an I64 vector |
Dictionaries
Dictionaries share the same physical layout as tables — a 2-pointer block (type = RAY_DICT, len = 2) holding a keys container and a vals container. Pair count is keys->len; the helpers ray_dict_keys / ray_dict_vals / ray_dict_len / ray_dict_get / ray_dict_upsert / ray_dict_remove wrap the layout.
Supported keys shapes:
- A typed vector — any of
RAY_SYM,RAY_BOOL,RAY_U8,RAY_I16,RAY_I32,RAY_I64,RAY_F32,RAY_F64,RAY_DATE,RAY_TIME,RAY_TIMESTAMP,RAY_STR,RAY_GUID. Lookup is value-equality; the keys' null bitmap is honored so0Nldoesn't collide with a real0. - A
RAY_LISTof boxed atoms — used for heterogeneous keys (the only shape that can hold both'symand"str"keys in the same dict). Lookup falls back toatom_eq.
Supported vals shapes:
- A typed vector when every value shares one atom type (the form
groupemits — its index columns happen to all beRAY_I64vectors held inside aRAY_LIST; see below). - A
RAY_LISTwhen values are heterogeneous, are themselves containers (vectors / dicts / tables / functions), or need to stay unevaluated. Dict literals always use this shape because the parser leaves expression values unevaluated until probed.
The {…} literal is the convenient surface but is narrower than the API. The parser only emits two key shapes:
- Bareword keys →
RAY_SYMvec. Identifier characters (letters, digits,_,-) are all interned as symbols — so{1: "a"}looks like an integer-keyed dict but actually stores the symbol named"1", and(at d 1)with ani64atom will miss. - Quoted-string keys →
RAY_STRvec. A mix of bareword and quoted keys in the same literal falls back to aRAY_LISTof boxed atoms.
To build a dict whose keys are real RAY_I64 / RAY_GUID / temporal values — i.e. ones a numeric or temporal lookup atom will actually match — use (dict keys vals) with an explicitly typed keys vector or call ray_dict_new from C.
; Dictionary literal (curly braces)
ray> {name: "Alice" age: 30 active: true}
{name: "Alice" age: 30 active: true}
; Access by key
ray> (set d {x: 10 y: 20})
ray> (get d 'x)
10
Dictionaries are used extensively in Rayfall for passing named arguments to query forms like select and update:
; The select argument is a dictionary
ray> (select {from:t where: (> x 1) cols: {x:x x2: (* x x)}})
Selection Bitmaps (RAY_SEL)
A selection bitmap is a lazy filter representation used internally by the query optimizer and executor. Instead of materializing filtered rows into a new vector, Rayforce tracks which rows pass the filter as a compact bitmap.
Segment Flags
Selections are organized in segments matching the morsel size (1024 elements). Each segment carries a flag that enables fast short-circuiting:
| Flag | Constant | Meaning |
|---|---|---|
| NONE | RAY_SEL_NONE (0) |
All bits zero — skip entire morsel, no rows pass |
| ALL | RAY_SEL_ALL (1) |
All bits one — process without checking bitmap |
| MIX | RAY_SEL_MIX (2) |
Mixed bits — must check bitmap per row |
OP_EXPAND chains (sideways information passing) and compose multiple predicates by ANDing bitmaps — all without copying data.
Block Layout
A RAY_SEL object has type = 14 and a variable-size layout in its data[] region:
- Segment flags — one
uint8_tper morsel (NONE/ALL/MIX) - Segment popcounts — one
int32_tper morsel (number of set bits) - Bit arrays — 16
uint64_twords per morsel (1024 bits)
// C API: bitmap manipulation
RAY_SEL_BIT_TEST(bits, row); // test if row passes
RAY_SEL_BIT_SET(bits, row); // mark row as passing
RAY_SEL_BIT_CLR(bits, row); // mark row as filtered
// Convert a BOOL vector to a selection bitmap
ray_t* sel = ray_sel_from_pred(bool_vec);
Partitioned Columns
Partitioned tables split data across multiple segments (typically by date). Each column in a partitioned table uses a parted type that wraps multiple memory-mapped vector segments into a single logical column.
Type Encoding
Parted types are encoded as RAY_PARTED_BASE + base_type. For example, a partitioned I64 column has type 32 + 5 = 37. The base type is recovered with RAY_PARTED_BASETYPE(t).
| Constant | Value | Description |
|---|---|---|
RAY_PARTED_BASE | 32 | Base offset for parted types |
RAY_MAPCOMMON | 64 | Virtual partition column (e.g., date) |
MAPCOMMON
When loading a date-partitioned table, Rayforce creates a virtual RAY_MAPCOMMON column. This column does not store actual data — it derives values from the partition directory names (e.g., 2024.01.15/). Each row in a partition shares the same date value, so the MAPCOMMON column can represent millions of rows with zero per-row storage.
// C API: load a date-partitioned table
ray_t* trades = ray_part_load("db", "trades");
// The 'date' column is MAPCOMMON — derived from directory names
// Queries that filter on date trigger partition pruning