Skip to content

.csv.* — CSV import and export

CSV is Rayfall's bulk text I/O format. The .csv.* builtins cover four shapes: in-memory read, in-memory write, direct-to-splayed conversion, and direct-to-partitioned conversion. The reader is parallel and zero-copy on the input side (mmap), supports automatic type inference, and accepts an optional schema for deterministic typing. The splayed and parted variants stream straight from the source CSV into on-disk column files without ever materialising the full table in memory — useful for files that don't fit.

Restricted under -U

All four CSV builtins (.csv.read, .csv.write, .csv.splayed, .csv.parted) are RAY_FN_RESTRICTED and return an access error when invoked from an IPC peer under a -U server. Local REPL / script execution is unaffected.

Reference

Function Arity Flags Description
.csv.read variadic restricted Parse a CSV file into an in-memory table.
.csv.write variadic restricted Serialise a table to a CSV file.
.csv.splayed variadic restricted Stream a CSV directly into a splayed on-disk table; return it loaded.
.csv.parted variadic restricted Stream a CSV directly into a partitioned table under a db root.

.csv.read

Signatures:

  • (.csv.read "path") — auto-detect column types from the header + a content sample.
  • (.csv.read [types] "path") — types is a SYM vector naming BOOL, I64, F64, STR, SYMBOL, F32, DATE, TIME, TIMESTAMP, GUID (case-insensitive). One entry per column.
  • (.csv.read [names] [types] "path") — also override the column names. With explicit names the input is assumed to have no header row.

Returns: a table. Empty fields and the empty string are read as null with RAY_ATTR_HAS_NULLS set on the column.

Errors: type (bad arg), domain (path is null / too long, or zero columns), io (open/read failure), limit (>256 columns).

Examples:

;; Auto-typed
(set t (.csv.read "trades.csv"))

;; Force schema — explicit column types
(set t (.csv.read [SYM F64 I64 DATE] "trades.csv"))

;; Force schema AND headerless input
(set t (.csv.read [sym price qty date] [SYM F64 I64 DATE] "trades.csv"))

The reader memory-maps the input, splits it into chunks, parses chunks in parallel, then merges column types and symbol-intern tables. Speedup is near-linear with core count on files of 100 MB+.

.csv.write

Signature: (.csv.write tbl "path"). Writes tbl with a header row and comma delimiter. Returns 0 on success.

Errors: type (tbl isn't a table, path isn't a string), domain (empty path), io on write failure.

(.csv.write trades "out.csv")

.csv.splayed

Signatures:

  • (.csv.splayed "src.csv" "out_dir")
  • (.csv.splayed [types] "src.csv" "out_dir")
  • (.csv.splayed [names] [types] "src.csv" "out_dir")

Streams src.csv chunk-by-chunk through the parser into a splayed directory at out_dir (one column file per field, plus a sym symbol-table file). Returns the resulting splayed table loaded with ray_read_splayed — zero-copy mmap, so the bytes stay on disk.

Use this when the CSV is too large to fit in memory but you want to query it like an in-memory table.

(set big (.csv.splayed [I64 F64 SYM] "huge.csv" "/data/huge"))
;; big is now a splayed table mmap'd from /data/huge/.

Errors propagate from the CSV parser (type, domain, limit) and from the splayed writer (io, oom).

.csv.parted

Signatures (with optional rows-per-part):

  • (.csv.parted "src.csv" "db_root" 'tbl_name)
  • (.csv.parted [types] "src.csv" "db_root" 'tbl_name)
  • (.csv.parted [names] [types] "src.csv" "db_root" 'tbl_name)
  • (.csv.parted [...]types "src.csv" rows_per_part "db_root" 'tbl_name) — pass a positive integer just before the db-root to override the default partition size.

Streams the CSV into a partitioned table under db_root/tbl_name/, creating subdirectories per partition. Returns the parted table loaded back from disk.

Constraints on tbl_name: must not start with ., must not contain / or \\ or ...

;; Default partition size, schema-driven
(set t (.csv.parted [SYM F64 I64 DATE] "trades.csv" "/data/db" 'trades))

;; Custom rows per partition (10M rows each)
(set t (.csv.parted [SYM F64 I64 DATE] "trades.csv" 10000000 "/data/db" 'trades))

See .db.parted.get and .db.parted.mount for how to reload the resulting layout.

See also