Syntax & Types¶
Rayfall is a Lisp-like query language with prefix notation, rich scalar types, columnar vectors, and first-class tables. The parser produces ray_t objects directly with no separate AST.
Atoms¶
Atoms are scalar values. Rayfall supports a wide range of types, each with a distinct literal syntax.
Integers¶
64-bit signed integers by default. Suffixed variants available for narrower types.
Floats¶
64-bit IEEE 754 double-precision floating point.
Booleans¶
Symbols¶
Symbols are interned identifiers used for column names, dictionary keys, and categorical data. Prefix with a single quote to create a literal symbol.
Strings¶
Double-quoted character sequences. Two internal representations: short strings (up to 12 bytes) stored inline, longer strings in a per-vector pool.
"hello" ; inline short string
"hello world!" ; still inline (12 bytes)
"a longer string" ; pool-allocated
Dates¶
Date literals use dot-separated year.month.day format. Stored as days since 2000-01-01 (i32).
Times¶
Time-of-day literals in HH:MM:SS.mmm format. Stored as milliseconds since midnight (i32).
Timestamps¶
Full date+time with nanosecond precision. Stored as nanoseconds since 2000-01-01 (i64). Literal form uses D as the date/time separator and requires a fractional-seconds suffix:
GUIDs¶
128-bit globally unique identifiers.
Null Values¶
Nulls are tracked via a per-element bitmap, not sentinel values. Typed null literals create atoms with the null bit set:
0Nl ; i64 null
0Ni ; i32 null
0Nf ; f64 null
0Nd ; date null
0Nt ; time null
0Np ; timestamp null
0Ns ; symbol null
(nil? x) ; true if x is null
Vectors¶
Vectors are homogeneous, typed, columnar arrays. Created with square brackets. The type is inferred from the first element.
[1 2 3 4 5] ; i64 vector
[1.5 2.7 3.9] ; f64 vector
[true false true] ; boolean vector
[AAPL GOOG MSFT] ; symbol vector
["hello" "world"] ; string vector
Vector operations are morsel-driven, processing 1024 elements at a time for cache efficiency.
Vector Arithmetic¶
All arithmetic operators auto-map over vectors (marked FN_ATOMIC):
Lists¶
Lists are heterogeneous collections of vectors. Created with the list function. Used as the data component of tables.
Tables¶
Tables are the core data structure in Rayforce. A table is a vector of column names paired with a list of column vectors. All column vectors must have the same length.
; Create a table with explicit column names
(set trades (table
[sym price size]
(list
[AAPL GOOG MSFT]
[150.5 2800.0 300.2]
[100 50 200])))
; Access column names
(key trades) ; [sym price size]
; Access column data
(value trades) ; list of 3 vectors
Dictionaries¶
Dictionaries map keys to values. Created with the dict function or with {key: value} syntax in query contexts.
Function Calls¶
Rayfall uses prefix (Polish) notation. Every expression is either an atom or a parenthesized list where the first element is the function:
Function Types¶
Built-in functions fall into three arity categories:
| Type | Arguments | Examples |
|---|---|---|
| Unary | Exactly 1 | sum, count, not, neg, type |
| Binary | Exactly 2 | +, -, set, take, at |
| Variadic | 1 or more | if, do, fn, select, list |
Function Flags¶
| Flag | Behavior |
|---|---|
FN_ATOMIC |
Auto-maps element-wise over vectors. (+ [1 2] [3 4]) yields [4 6]. |
FN_AGGR |
Aggregation function. Reduces a vector to a scalar. (sum [1 2 3]) yields 6. |
FN_SPECIAL_FORM |
Arguments are not evaluated before being passed. Used by set, if, fn, select. |
Quoting¶
The single quote ' prevents evaluation, creating a symbol atom. Useful for column references and dictionary keys:
Comments¶
Line comments start with a semicolon and extend to the end of the line:
Control Flow¶
Conditional: if¶
Evaluates the condition and returns the true or false branch. Supports if/then/else chaining:
Sequential Execution: do¶
Evaluates multiple expressions in order, returning the last result:
Variable Binding: set and let¶
Dotted Namespaces¶
A symbol whose name contains one or more . is a dotted symbol — it names a path through nested dictionaries rather than a single global binding. set auto-creates the intermediate dicts on write, and read/delete traverse them with copy-on-write semantics so other references to the enclosing dict see their old value.
; Write a nested namespace — intermediate dicts are created automatically
(set math.pi 3.14159)
(set math.e 2.71828)
(set cfg.db.host "localhost")
(set cfg.db.port 5432)
; Read walks the path segment by segment
math.pi ; 3.14159
cfg.db.host ; "localhost"
; Deleting a leaf cascades: cfg.db becomes empty, then cfg.db is itself removed
(del cfg.db.host)
(del cfg.db.port)
cfg.db ; error — cleaned up after last leaf went away
When a dotted path lands on a temporal value (DATE / TIME / TIMESTAMP atom or vector), the trailing segment dispatches through the temporal extraction API instead of looking for a dict key. This lets you reach calendar fields uniformly:
(set d 2024.03.15)
d.yyyy ; 2024
d.mm ; 3
d.dd ; 15
d.dow ; 5 (ISO: Mon=1..Sun=7)
; Vector of dates — the whole column is lifted
(set ds [2024.01.01 2024.06.15 2024.12.31])
ds.yyyy ; [2024 2024 2024]
ds.doy ; [1 167 366]
; Inside a table, col.yyyy resolves against the column vector
(select {from: trades by: Ts.date}) ; group by day
(select {from: trades by: Ts.hh}) ; group by hour of day
Recognised temporal segments: yyyy, mm, dd, hh, minute, ss, dow, doy, plus the two truncations date (drop time-of-day) and time (drop date). Null input rows propagate to null output rows — the null sentinel bit pattern is not decoded as a bogus calendar value.
Reserved .* Namespace¶
Symbols whose name starts with . are a reserved system namespace populated at startup by builtin registration. Typing one of these at the REPL returns the namespace dict for introspection:
.sys ; {gc:<.sys.gc> exec:<.sys.exec> info:<.sys.info> mem:<.sys.mem> build:<.sys.build>}
.os ; {getenv:<.os.getenv> setenv:<.os.setenv>}
.ipc ; {open:<.ipc.open> close:<.ipc.close> send:<.ipc.send>}
.csv ; {read:<.csv.read> write:<.csv.write>}
Every .-prefixed name is protected: set, let, lambda parameters, and del all refuse such names with error: reserve. This keeps user code from shadowing the builtin surface, regardless of where it's bound.
(set .os.foo 1) ; error: reserve
(let .sys.gc 99) ; error: reserve
((fn [.sys.gc] .sys.gc) 7) ; error: reserve (lambda parameter name)
(del .sys.gc) ; error: reserve: cannot delete reserved binding
Error Handling: try / raise¶
(try
(raise "oops") ; throws an error
(fn [e] "caught")) ; handler receives error → "caught"
(raise "custom error") ; throw an error
Lambdas & the VM¶
User-defined functions are created with fn. Lambdas compile lazily to bytecode and run in a stack-based computed-goto VM (ray_vm_t) with a 1024-slot program stack and return stack.
; Named function
(set square (fn [x] (* x x)))
(square 5) ; 25
; Multi-expression body
(set clamp (fn [x lo hi]
(if (< x lo) lo
(> x hi) hi
x)))
; Anonymous lambda passed to map
(map (fn [x] (* x 2)) [1 2 3]) ; [2 4 6]
The VM supports trap frames for try/raise error handling, ensuring exceptions unwind cleanly through compiled code.
Select & Update¶
The select and update builtins bridge to the Rayforce DAG executor. They accept a dictionary of options:
select¶
; Basic filter
(select {from: trades where: (> price 100)})
; Project specific columns with expressions
(select {from: trades
cols: {sym: sym notional: (* price size)}})
; Group by with aggregation
(select {from: trades
by: {sym: sym}
cols: {avg_price: (avg price)
total_size: (sum size)}})
update¶
insert / upsert¶
; Insert new rows
(insert {into: trades
values: (table [sym price size]
(list [TSLA] [250.0] [300]))})
C API¶
Rayforce exposes a single public header: include/rayforce.h. The core abstraction is ray_t — a 32-byte block header. Every object (atom, vector, list, table) is a ray_t with data following at byte 32.
Key Types¶
| Type | Description |
|---|---|
ray_t |
32-byte universal block header for all objects |
ray_err_t |
Error code return type |
ray_str_t |
16-byte string element (inline or pooled) |
ray_csr_t |
CSR graph edge storage |
ray_rel_t |
Graph relationship (forward + reverse CSR) |
ray_arena_t |
Bump allocator for bulk allocations |
ray_vm_t |
Bytecode VM for compiled lambdas |
Error Handling¶
ray_t* result = ray_eval_str("(+ 1 2)");
if (RAY_IS_ERR(result)) {
// handle error
}
// RAY_ERR_PTR() to create error pointers
Memory Management¶
Never use malloc/free. Use the Rayforce allocator:
ray_t* obj = ray_alloc(size); // general allocation
ray_release(obj); // decrement refcount, free if zero
ray_retain(obj); // increment refcount
ray_t* copy = ray_cow(obj); // copy-on-write
DAG & Execution¶
The execution pipeline builds a lazy DAG, optimizes it, then executes with fused morsel-driven processing:
// 1. Build lazy DAG
ray_t* g = ray_graph_new(df);
ray_t* filtered = ray_filter(g, predicate);
ray_t* projected = ray_project(g, filtered, cols);
// 2. Execute (optimizer runs automatically)
ray_t* result = ray_execute(g, projected);
Optimizer Passes¶
- Type inference — propagate types through the DAG
- Constant folding — evaluate compile-time-known expressions
- SIP (Sideways Information Passing) — propagate selection bitmaps backward through expand chains
- Factorize — avoid materializing cross-products with factorized vectors
- Predicate pushdown — move filters closer to data sources
- Filter reorder — cheapest filters first
- Fusion — merge adjacent operations into single morsel loops
- DCE (Dead Code Elimination) — remove unused DAG nodes
CSR Storage¶
Rayforce stores graph edges in double-indexed Compressed Sparse Row (CSR) format: one forward index (source to destination) and one reverse index (destination to source). Both indices are built simultaneously.
// Build CSR from edge list
ray_csr_t csr;
ray_csr_build(&csr, src_ids, dst_ids, n_edges);
// Persist to disk
ray_csr_save(&csr, "edges");
// Memory-map for zero-copy access
ray_csr_mmap(&csr, "edges");
Graph Algorithms¶
Available as DAG opcodes, all integrated into the same morsel-driven pipeline:
| Opcode | Algorithm | Description |
|---|---|---|
OP_EXPAND |
1-Hop Expand | Follow edges one step from source nodes |
OP_VAR_EXPAND |
BFS | Variable-length path expansion (breadth-first) |
OP_SHORTEST_PATH |
Shortest Path | Single-source shortest paths |
OP_ASTAR |
A* | Heuristic-guided shortest path |
OP_K_SHORTEST |
Yen's K-Shortest | K shortest loopless paths |
OP_WCO_JOIN |
LFTJ | Worst-case optimal join (Leapfrog Triejoin) |
OP_BETWEENNESS |
Brandes | Betweenness centrality |
OP_CLOSENESS |
Closeness | Closeness centrality |
OP_CLUSTER_COEFF |
Clustering | Local clustering coefficients |
OP_RANDOM_WALK |
Random Walk | Random walks on graph |
OP_MST |
Kruskal | Minimum spanning tree |
Pipeline & Optimizer¶
The full execution pipeline:
Rayfall source
| parse (ASCII dispatch table, recursive descent)
v
ray_t objects (no separate AST)
| ray_eval() / bytecode VM
v
Lazy DAG construction
| ray_graph_new() -> ray_scan/ray_add/ray_filter/...
v
Optimizer (8 passes)
| type inference -> constant fold -> SIP -> factorize
| -> predicate pushdown -> filter reorder -> fusion -> DCE
v
Fused morsel-driven executor
| bytecode over register slots, 1024 elements per morsel
v
Result (ray_t)
Memory Model¶
- Buddy allocator with thread-local arenas for contention-free allocation
- Slab cache for small, frequently-allocated objects
- COW ref counting —
ray_cow()returns a private copy only when the refcount exceeds 1 - Arena (bump) allocator (
ray_arena_t) for bulk short-lived allocations; blocks carryRAY_ATTR_ARENA, making retain/release no-ops - Per-VM heaps — each heap carries a
heap_id(u16); cross-heap frees enqueue to a lock-free LIFO, reclaimed viaray_heap_flush_foreign()
Files & Partitions¶
- Column files — native binary format for vectors and CSR graphs, supports mmap
- Sym table — global string intern table, arena-backed, append-only persistence with file locking
- CSV loader — mmap-based, parallel parse, automatic type inference, null handling, sym merge
- File I/O — cross-platform locking (flock/LockFileEx), fsync, atomic rename