Syntax & Types
Rayfall is a Lisp-like query language with prefix notation, rich scalar types, columnar vectors, and first-class tables. The parser produces ray_t objects directly with no separate AST.
Atoms
Atoms are scalar values. Rayfall supports a wide range of types, each with a distinct literal syntax.
Integers
64-bit signed integers by default. Suffixed variants available for narrower types.
42 ; i64 (default)
-17 ; negative
0 ; zero
1000000 ; no separators needed
Floats
64-bit IEEE 754 double-precision floating point.
3.14 ; standard float
-0.5 ; negative float
1e6 ; scientific notation
2.5e-3 ; small value
Booleans
true ; boolean true (1b)
false ; boolean false (0b)
Symbols
Symbols are interned identifiers used for column names, dictionary keys, and categorical data. Prefix with a single quote to create a literal symbol.
'AAPL ; symbol atom
'price ; used as column reference
'hello ; any alphanumeric + hyphens
Strings
Double-quoted character sequences. Two internal representations: short strings (up to 12 bytes) stored inline, longer strings in a per-vector pool.
"hello" ; inline short string
"hello world!" ; still inline (12 bytes)
"a longer string" ; pool-allocated
Dates
Date literals use dot-separated year.month.day format. Stored as days since 2000-01-01 (i32).
2024.01.15 ; January 15, 2024
2023.12.31 ; December 31, 2023
Times
Time-of-day literals in HH:MM:SS.mmm format. Stored as milliseconds since midnight (i32).
09:30:00.000 ; 9:30 AM
14:15:30.500 ; 2:15:30.5 PM
Timestamps
Full date+time with nanosecond precision. Stored as nanoseconds since 2000-01-01 (i64). Literal form uses D as the date/time separator and requires a fractional-seconds suffix:
2024.01.15D09:30:00.000 ; date + time (ms)
2024.01.15D09:30:00.000000000 ; date + time (ns)
GUIDs
128-bit globally unique identifiers.
(guid 0) ; generate a new GUID
Null Values
Nulls are tracked via a per-element bitmap, not sentinel values. Typed null literals create atoms with the null bit set:
0Nl ; i64 null
0Ni ; i32 null
0Nf ; f64 null
0Nd ; date null
0Nt ; time null
0Np ; timestamp null
0Ns ; symbol null
(nil? x) ; true if x is null
Vectors
Vectors are homogeneous, typed, columnar arrays. Created with square brackets. The type is inferred from the first element.
[1 2 3 4 5] ; i64 vector
[1.5 2.7 3.9] ; f64 vector
[true false true] ; boolean vector
[AAPL GOOG MSFT] ; symbol vector
["hello" "world"] ; string vector
Vector operations are morsel-driven, processing 1024 elements at a time for cache efficiency.
Vector Arithmetic
All arithmetic operators auto-map over vectors (marked FN_ATOMIC):
(+ [1 2 3] [10 20 30]) ; [11 22 33]
(* [1 2 3] 10) ; [10 20 30] — scalar broadcast
Lists
Lists are heterogeneous collections of vectors. Created with the list function. Used as the data component of tables.
(list [1 2 3] [A B C]) ; list of two vectors
Tables
Tables are the core data structure in Rayforce. A table is a vector of column names paired with a list of column vectors. All column vectors must have the same length.
; Create a table with explicit column names
(set trades (table
[sym price size]
(list
[AAPL GOOG MSFT]
[150.5 2800.0 300.2]
[100 50 200])))
; Access column names
(key trades) ; [sym price size]
; Access column data
(value trades) ; list of 3 vectors
Dictionaries
Dictionaries map keys to values. Created with the dict function or with {key: value} syntax in query contexts.
(set d (dict [a b c] [1 2 3]))
(get d 'a) ; 1
(key d) ; [a b c]
(value d) ; [1 2 3]
Function Calls
Rayfall uses prefix (Polish) notation. Every expression is either an atom or a parenthesized list where the first element is the function:
(+ 1 2) ; 3
(* (+ 1 2) 3) ; 9 — nested
(sum [1 2 3]) ; 6
(count [1 2 3]) ; 3
Function Types
Built-in functions fall into three arity categories:
| Type | Arguments | Examples |
|---|---|---|
| Unary | Exactly 1 | sum, count, not, neg, type |
| Binary | Exactly 2 | +, -, set, take, at |
| Variadic | 1 or more | if, do, fn, select, list |
Function Flags
| Flag | Behavior |
|---|---|
FN_ATOMIC | Auto-maps element-wise over vectors. (+ [1 2] [3 4]) yields [4 6]. |
FN_AGGR | Aggregation function. Reduces a vector to a scalar. (sum [1 2 3]) yields 6. |
FN_SPECIAL_FORM | Arguments are not evaluated before being passed. Used by set, if, fn, select. |
Quoting
The single quote ' prevents evaluation, creating a symbol atom. Useful for column references and dictionary keys:
'price ; symbol, not a variable lookup
(quote (+ 1 2)) ; returns the unevaluated list (+ 1 2)
Comments
Line comments start with a semicolon and extend to the end of the line:
; This is a comment
(+ 1 2) ; inline comment
Control Flow
Conditional: if
Evaluates the condition and returns the true or false branch. Supports if/then/else chaining:
(if (> x 0) "positive" "non-positive")
; Multi-branch
(if (> x 100) "high"
(> x 50) "mid"
"low")
Sequential Execution: do
Evaluates multiple expressions in order, returning the last result:
(do
(set x 10)
(set y 20)
(+ x y)) ; 30
Variable Binding: set and let
(set x 42) ; global binding
(let y (+ x 1)) ; local binding
Dotted Namespaces
A symbol whose name contains one or more . is a dotted symbol — it names a path through nested dictionaries rather than a single global binding. set auto-creates the intermediate dicts on write, and read/delete traverse them with copy-on-write semantics so other references to the enclosing dict see their old value.
; Write a nested namespace — intermediate dicts are created automatically
(set math.pi 3.14159)
(set math.e 2.71828)
(set cfg.db.host "localhost")
(set cfg.db.port 5432)
; Read walks the path segment by segment
math.pi ; 3.14159
cfg.db.host ; "localhost"
; Deleting a leaf cascades: cfg.db becomes empty, then cfg.db is itself removed
(del cfg.db.host)
(del cfg.db.port)
cfg.db ; error — cleaned up after last leaf went away
When a dotted path lands on a temporal value (DATE / TIME / TIMESTAMP atom or vector), the trailing segment dispatches through the temporal extraction API instead of looking for a dict key. This lets you reach calendar fields uniformly:
(set d 2024.03.15)
d.yyyy ; 2024
d.mm ; 3
d.dd ; 15
d.dow ; 5 (ISO: Mon=1..Sun=7)
; Vector of dates — the whole column is lifted
(set ds [2024.01.01 2024.06.15 2024.12.31])
ds.yyyy ; [2024 2024 2024]
ds.doy ; [1 167 366]
; Inside a table, col.yyyy resolves against the column vector
(select {from: trades by: Ts.date}) ; group by day
(select {from: trades by: Ts.hh}) ; group by hour of day
Recognised temporal segments: yyyy, mm, dd, hh, minute, ss, dow, doy, plus the two truncations date (drop time-of-day) and time (drop date). Null input rows propagate to null output rows — the null sentinel bit pattern is not decoded as a bogus calendar value.
Reserved .* Namespace
Symbols whose name starts with . are a reserved system namespace populated at startup by builtin registration. Typing one of these at the REPL returns the namespace dict for introspection:
.sys ; {gc:<.sys.gc> exec:<.sys.exec> info:<.sys.info> mem:<.sys.mem> build:<.sys.build>}
.os ; {getenv:<.os.getenv> setenv:<.os.setenv>}
.ipc ; {open:<.ipc.open> close:<.ipc.close> send:<.ipc.send>}
.csv ; {read:<.csv.read> write:<.csv.write>}
Every .-prefixed name is protected: set, let, lambda parameters, and del all refuse such names with error: reserve. This keeps user code from shadowing the builtin surface, regardless of where it's bound.
(set .os.foo 1) ; error: reserve
(let .sys.gc 99) ; error: reserve
((fn [.sys.gc] .sys.gc) 7) ; error: reserve (lambda parameter name)
(del .sys.gc) ; error: reserve: cannot delete reserved binding
Error Handling: try / raise
(try
(raise "oops") ; throws an error
(fn [e] "caught")) ; handler receives error → "caught"
(raise "custom error") ; throw an error
Lambdas & the VM
User-defined functions are created with fn. Lambdas compile lazily to bytecode and run in a stack-based computed-goto VM (ray_vm_t) with a 1024-slot program stack and return stack.
; Named function
(set square (fn [x] (* x x)))
(square 5) ; 25
; Multi-expression body
(set clamp (fn [x lo hi]
(if (< x lo) lo
(> x hi) hi
x)))
; Anonymous lambda passed to map
(map (fn [x] (* x 2)) [1 2 3]) ; [2 4 6]
The VM supports trap frames for try/raise error handling, ensuring exceptions unwind cleanly through compiled code.
Select & Update
The select and update builtins bridge to the Rayforce DAG executor. They accept a dictionary of options:
select
; Basic filter
(select {from: trades where: (> price 100)})
; Project specific columns with expressions
(select {from: trades
cols: {sym: sym notional: (* price size)}})
; Group by with aggregation
(select {from: trades
by: {sym: sym}
cols: {avg_price: (avg price)
total_size: (sum size)}})
update
; Add a computed column
(update {from: trades
cols: {notional: (* price size)}})
insert / upsert
; Insert new rows
(insert {into: trades
values: (table [sym price size]
(list [TSLA] [250.0] [300]))})
C API
Rayforce exposes a single public header: include/rayforce.h. The core abstraction is ray_t — a 32-byte block header. Every object (atom, vector, list, table) is a ray_t with data following at byte 32.
Key Types
| Type | Description |
|---|---|
ray_t | 32-byte universal block header for all objects |
ray_err_t | Error code return type |
ray_str_t | 16-byte string element (inline or pooled) |
ray_csr_t | CSR graph edge storage |
ray_rel_t | Graph relationship (forward + reverse CSR) |
ray_arena_t | Bump allocator for bulk allocations |
ray_vm_t | Bytecode VM for compiled lambdas |
Error Handling
ray_t* result = ray_eval_str("(+ 1 2)");
if (RAY_IS_ERR(result)) {
// handle error
}
// RAY_ERR_PTR() to create error pointers
Memory Management
Never use malloc/free. Use the Rayforce allocator:
ray_t* obj = ray_alloc(size); // general allocation
ray_release(obj); // decrement refcount, free if zero
ray_retain(obj); // increment refcount
ray_t* copy = ray_cow(obj); // copy-on-write
DAG & Execution
The execution pipeline builds a lazy DAG, optimizes it, then executes with fused morsel-driven processing:
// 1. Build lazy DAG
ray_t* g = ray_graph_new(df);
ray_t* filtered = ray_filter(g, predicate);
ray_t* projected = ray_project(g, filtered, cols);
// 2. Execute (optimizer runs automatically)
ray_t* result = ray_execute(g, projected);
Optimizer Passes
- Type inference — propagate types through the DAG
- Constant folding — evaluate compile-time-known expressions
- SIP (Sideways Information Passing) — propagate selection bitmaps backward through expand chains
- Factorize — avoid materializing cross-products with factorized vectors
- Predicate pushdown — move filters closer to data sources
- Filter reorder — cheapest filters first
- Fusion — merge adjacent operations into single morsel loops
- DCE (Dead Code Elimination) — remove unused DAG nodes
CSR Storage
Rayforce stores graph edges in double-indexed Compressed Sparse Row (CSR) format: one forward index (source to destination) and one reverse index (destination to source). Both indices are built simultaneously.
// Build CSR from edge list
ray_csr_t csr;
ray_csr_build(&csr, src_ids, dst_ids, n_edges);
// Persist to disk
ray_csr_save(&csr, "edges");
// Memory-map for zero-copy access
ray_csr_mmap(&csr, "edges");
Graph Algorithms
Available as DAG opcodes, all integrated into the same morsel-driven pipeline:
| Opcode | Algorithm | Description |
|---|---|---|
OP_EXPAND | 1-Hop Expand | Follow edges one step from source nodes |
OP_VAR_EXPAND | BFS | Variable-length path expansion (breadth-first) |
OP_SHORTEST_PATH | Shortest Path | Single-source shortest paths |
OP_ASTAR | A* | Heuristic-guided shortest path |
OP_K_SHORTEST | Yen's K-Shortest | K shortest loopless paths |
OP_WCO_JOIN | LFTJ | Worst-case optimal join (Leapfrog Triejoin) |
OP_BETWEENNESS | Brandes | Betweenness centrality |
OP_CLOSENESS | Closeness | Closeness centrality |
OP_CLUSTER_COEFF | Clustering | Local clustering coefficients |
OP_RANDOM_WALK | Random Walk | Random walks on graph |
OP_MST | Kruskal | Minimum spanning tree |
Pipeline & Optimizer
The full execution pipeline:
Rayfall source
| parse (ASCII dispatch table, recursive descent)
v
ray_t objects (no separate AST)
| ray_eval() / bytecode VM
v
Lazy DAG construction
| ray_graph_new() -> ray_scan/ray_add/ray_filter/...
v
Optimizer (8 passes)
| type inference -> constant fold -> SIP -> factorize
| -> predicate pushdown -> filter reorder -> fusion -> DCE
v
Fused morsel-driven executor
| bytecode over register slots, 1024 elements per morsel
v
Result (ray_t)
Memory Model
- Buddy allocator with thread-local arenas for contention-free allocation
- Slab cache for small, frequently-allocated objects
- COW ref counting —
ray_cow()returns a private copy only when the refcount exceeds 1 - Arena (bump) allocator (
ray_arena_t) for bulk short-lived allocations; blocks carryRAY_ATTR_ARENA, making retain/release no-ops - Per-VM heaps — each heap carries a
heap_id(u16); cross-heap frees enqueue to a lock-free LIFO, reclaimed viaray_heap_flush_foreign()
Files & Partitions
- Column files — native binary format for vectors and CSR graphs, supports mmap
- Sym table — global string intern table, arena-backed, append-only persistence with file locking
- CSV loader — mmap-based, parallel parse, automatic type inference, null handling, sym merge
- File I/O — cross-platform locking (flock/LockFileEx), fsync, atomic rename