Data Loading & Storage Tutorial
Read CSV files, explore and filter data, serialize tables to binary, persist with splayed storage, and export results — all from the Rayfall REPL.
This tutorial assumes you have built Rayforce and can start the REPL with ./rayforce. We use /tmp/rayforce-test/ as the working directory for all file operations.
1. Reading CSV Files
Create a CSV file on disk, then load it with .csv.read. The first row is treated as column headers, and types are inferred automatically:
(set trades (.csv.read "/tmp/rayforce-test/trades.csv"))
trades
Assuming trades.csv contains:
Symbol,Price,Qty
AAPL,150.5,100
GOOG,280.0,200
MSFT,420.0,50
AAPL,155.0,300
GOOG,275.5,150
Output:
┌────────┬───────┬────────────────────┐
│ Symbol │ Price │ Qty │
│ sym │ f64 │ i64 │
├────────┼───────┼────────────────────┤
│ AAPL │ 150.5 │ 100 │
│ GOOG │ 280.0 │ 200 │
│ MSFT │ 420.0 │ 50 │
│ AAPL │ 155.0 │ 300 │
│ GOOG │ 275.5 │ 150 │
├────────┴───────┴────────────────────┤
│ 5 rows (5 shown) 3 columns (3 shown)│
└─────────────────────────────────────┘
String-like columns (e.g. ticker symbols, names) are loaded as sym (dictionary-encoded symbols). Numeric columns are inferred as i64 or f64 depending on whether any value has a decimal point.
You can also create tables in-memory and skip CSV entirely:
(set trades (table [Symbol Price Qty]
(list
[AAPL GOOG MSFT AAPL GOOG]
[150.5 280.0 420.0 155.0 275.5]
[100 200 50 300 150])))
2. Exploring Data
Use count, type, and show to inspect a loaded table:
(count trades)
5
(type trades)
TABLE
show limits the display to the first n rows — useful for large datasets:
(show trades 3)
┌────────┬───────┬────────────────────┐
│ Symbol │ Price │ Qty │
│ sym │ f64 │ i64 │
├────────┼───────┼────────────────────┤
│ AAPL │ 150.5 │ 100 │
│ GOOG │ 280.0 │ 200 │
│ MSFT │ 420.0 │ 50 │
│ AAPL │ 155.0 │ 300 │
│ GOOG │ 275.5 │ 150 │
├────────┴───────┴────────────────────┤
│ 5 rows (5 shown) 3 columns (3 shown)│
└─────────────────────────────────────┘
3. Filtering Rows
Use select with a where: clause to filter rows. Find all trades where the price exceeds 200:
(select {from:trades where: (> Price 200.0)})
┌────────┬───────┬────────────────────┐
│ Symbol │ Price │ Qty │
│ sym │ f64 │ i64 │
├────────┼───────┼────────────────────┤
│ GOOG │ 280.0 │ 200 │
│ MSFT │ 420.0 │ 50 │
│ GOOG │ 275.5 │ 150 │
├────────┴───────┴────────────────────┤
│ 3 rows (3 shown) 3 columns (3 shown)│
└─────────────────────────────────────┘
4. Grouping and Aggregation
Group by a column and compute aggregates with by::
(select {from:trades by: Symbol
total_qty: (sum Qty)
avg_price: (avg Price)})
┌────────┬───────────┬────────────────┐
│ Symbol │ total_qty │ avg_price │
│ sym │ i64 │ f64 │
├────────┼───────────┼────────────────┤
│ AAPL │ 400 │ 152.75 │
│ GOOG │ 350 │ 277.75 │
│ MSFT │ 50 │ 420.0 │
├────────┴───────────┴────────────────┤
│ 3 rows (3 shown) 3 columns (3 shown)│
└─────────────────────────────────────┘
Available aggregation functions: sum, avg, min, max, count, first, last, med.
5. Binary Serialization
Use ser to serialize any Rayforce object to a compact binary format (a U8 byte vector), and de to deserialize it back. This is useful for caching, IPC, or storing intermediate results:
(set bytes (ser trades))
(count bytes)
197
The entire table is encoded in 197 bytes. Deserialize with de:
(set restored (de bytes))
restored
┌────────┬───────┬────────────────────┐
│ Symbol │ Price │ Qty │
│ sym │ f64 │ i64 │
├────────┼───────┼────────────────────┤
│ AAPL │ 150.5 │ 100 │
│ GOOG │ 280.0 │ 200 │
│ MSFT │ 420.0 │ 50 │
│ AAPL │ 155.0 │ 300 │
│ GOOG │ 275.5 │ 150 │
├────────┴───────┴────────────────────┤
│ 5 rows (5 shown) 3 columns (3 shown)│
└─────────────────────────────────────┘
The deserialized table is identical to the original — all types, column names, and values are preserved.
6. Splayed Tables
Splayed tables store each column as a separate file on disk. This is Rayforce's native columnar format — faster than CSV and preserving exact types. Use .db.splayed.set to save and .db.splayed.get to load:
(.db.splayed.set "/tmp/rayforce-test/trades_db" trades)
┌────────┬───────┬────────────────────┐
│ Symbol │ Price │ Qty │
│ sym │ f64 │ i64 │
├────────┼───────┼────────────────────┤
│ AAPL │ 150.5 │ 100 │
│ GOOG │ 280.0 │ 200 │
│ MSFT │ 420.0 │ 50 │
│ AAPL │ 155.0 │ 300 │
│ GOOG │ 275.5 │ 150 │
├────────┴───────┴────────────────────┤
│ 5 rows (5 shown) 3 columns (3 shown)│
└─────────────────────────────────────┘
On disk, this creates one file per column plus a symbol table:
/tmp/rayforce-test/trades_db/
Symbol Price Qty sym sym.lk
Load it back in a new session (or the same one) with .db.splayed.get:
(set loaded (.db.splayed.get "/tmp/rayforce-test/trades_db"))
loaded
┌────────┬───────┬────────────────────┐
│ Symbol │ Price │ Qty │
│ sym │ f64 │ i64 │
├────────┼───────┼────────────────────┤
│ AAPL │ 150.5 │ 100 │
│ GOOG │ 280.0 │ 200 │
│ MSFT │ 420.0 │ 50 │
│ AAPL │ 155.0 │ 300 │
│ GOOG │ 275.5 │ 150 │
├────────┴───────┴────────────────────┤
│ 5 rows (5 shown) 3 columns (3 shown)│
└─────────────────────────────────────┘
All types are preserved exactly — f64 stays f64, sym stays sym. No type inference needed on reload.
7. Writing CSV
Export any table to CSV with .csv.write. The return value 0 indicates success:
(set big (select {from:trades where: (> Price 200.0)}))
(.csv.write big "/tmp/rayforce-test/big_trades.csv")
0
Verify by reading it back:
(.csv.read "/tmp/rayforce-test/big_trades.csv")
┌────────┬───────┬────────────────────┐
│ Symbol │ Price │ Qty │
│ sym │ f64 │ i64 │
├────────┼───────┼────────────────────┤
│ GOOG │ 280.0 │ 200 │
│ MSFT │ 420.0 │ 50 │
│ GOOG │ 275.5 │ 150 │
├────────┴───────┴────────────────────┤
│ 3 rows (3 shown) 3 columns (3 shown)│
└─────────────────────────────────────┘
Note: float values written without a fractional part (e.g. 280.0) may be read back as i64 since the CSV reader infers the narrowest fitting type. Use splayed tables when exact type preservation matters.
8. Complete Example
End-to-end workflow: create data, save as CSV, reload, filter, serialize, deserialize, and verify:
; 1. Create a table
(set trades (table [Symbol Price Qty]
(list
[AAPL GOOG MSFT AAPL GOOG]
[150.5 280.0 420.0 155.0 275.5]
[100 200 50 300 150])))
; 2. Save to CSV
(.csv.write trades "/tmp/rayforce-test/trades.csv")
; 3. Reload from CSV
(set loaded (.csv.read "/tmp/rayforce-test/trades.csv"))
; 4. Filter: only trades above 200
(set big (select {from:loaded where: (> Price 200.0)}))
; 5. Serialize to binary
(set bytes (ser big))
; 6. Deserialize and verify
(set restored (de bytes))
(count restored)
; 7. Save as splayed table for fast reload
(.db.splayed.set "/tmp/rayforce-test/big_db" big)
; 8. Load splayed table back
(.db.splayed.get "/tmp/rayforce-test/big_db")
┌────────┬───────┬────────────────────┐
│ Symbol │ Price │ Qty │
│ sym │ f64 │ i64 │
├────────┼───────┼────────────────────┤
│ GOOG │ 280.0 │ 200 │
│ MSFT │ 420.0 │ 50 │
│ GOOG │ 275.5 │ 150 │
├────────┴───────┴────────────────────┤
│ 3 rows (3 shown) 3 columns (3 shown)│
└─────────────────────────────────────┘
Storage Format Comparison
| Format | Function | Preserves Types | Best For |
|---|---|---|---|
| CSV | .csv.read / .csv.write | No (re-inferred on load) | Interoperability, human-readable data |
| Binary | ser / de | Yes | Caching, IPC, embedding in messages |
| Splayed | .db.splayed.set / .db.splayed.get | Yes | Persistent storage, fast reload, mmap |
Next Steps
- Getting Started Tutorial — Tables, filtering, joins, pivots, and sorting
- Data Persistence — Partitioned storage and advanced file I/O
- Analytics Cookbook — Time-series, top-N, ASOF joins
- Functions Reference — Complete list of all built-in functions