Joins¶
Join operations combine rows from two Tables based on matching column values.
Left Join¶
Keeps all rows from the left table and matching rows from the right table. Missing matches are filled with null values. Use left join when you want to preserve all records from the left table.
(set trades (table [symbol order_id price quantity]
(list ['AAPL 'MSFT 'GOOG] [1001 1002 1003] [150.25 300.50 125.75] [100 200 150])))
┌────────┬──────────┬────────┬──────────┐
│ symbol │ order_id │ price │ quantity │
├────────┼──────────┼────────┼──────────┤
│ AAPL │ 1001 │ 150.25 │ 100 │
│ MSFT │ 1002 │ 300.50 │ 200 │
│ GOOG │ 1003 │ 125.75 │ 150 │
├────────┴──────────┴────────┴──────────┤
│ 3 rows (3 shown) 4 columns (4 shown) │
└───────────────────────────────────────┘
(set orders (table [order_id client_id timestamp status]
(list [1001 1002 1004]
['CLIENT_A 'CLIENT_B 'CLIENT_C]
[09:00:00 09:05:00 09:10:00]
['FILLED 'FILLED 'PENDING])))
┌──────────┬───────────┬──────────────┬─────────┐
│ order_id │ client_id │ timestamp │ status │
├──────────┼───────────┼──────────────┼─────────┤
│ 1001 │ CLIENT_A │ 09:00:00.000 │ FILLED │
│ 1002 │ CLIENT_B │ 09:05:00.000 │ FILLED │
│ 1004 │ CLIENT_C │ 09:10:00.000 │ PENDING │
├──────────┴───────────┴──────────────┴─────────┤
│ 3 rows (3 shown) 4 columns (4 shown) │
└───────────────────────────────────────────────┘
(left-join [order_id] trades orders)
┌──────────┬──────────────┬────────┬────────┬──────────┬───────────┬────────┐
│ order_id │ timestamp │ symbol │ price │ quantity │ client_id │ status │
├──────────┼──────────────┼────────┼────────┼──────────┼───────────┼────────┤
│ 1001 │ 09:00:00.000 │ AAPL │ 150.25 │ 100 │ CLIENT_A │ FILLED │
│ 1002 │ 09:05:00.000 │ MSFT │ 300.50 │ 200 │ CLIENT_B │ FILLED │
│ 1003 │ Null │ GOOG │ 125.75 │ 150 │ Null │ Null │
├──────────┴──────────────┴────────┴────────┴──────────┴───────────┴────────┤
│ 3 rows (3 shown) 7 columns (7 shown) │
└───────────────────────────────────────────────────────────────────────────┘
Multiple Column Join¶
(set positions (table [symbol account_id quantity]
(list ['AAPL 'MSFT 'AAPL] ['ACC001 'ACC001 'ACC002] [1000 500 2000])))
(set prices (table [symbol account_id price]
(list ['AAPL 'MSFT 'AAPL] ['ACC001 'ACC001 'ACC002] [150.25 300.50 150.30])))
(left-join [symbol account_id] positions prices) ;; Left join on multiple columns: symbol and account_id
┌────────┬────────────┬────────┬──────────┐
│ symbol │ account_id │ price │ quantity │
├────────┼────────────┼────────┼──────────┤
│ AAPL │ ACC001 │ 150.25 │ 1000 │
│ MSFT │ ACC001 │ 300.50 │ 500 │
│ AAPL │ ACC002 │ 150.30 │ 2000 │
├────────┴────────────┴────────┴──────────┤
│ 3 rows (3 shown) 4 columns (4 shown) │
└─────────────────────────────────────────┘
The left-join function takes three arguments:
1. A Vector of Symbols representing join columns.
2. The left Table
3. The right Table
Inner Join¶
Keeps only rows where the join columns match in both tables. Returns the intersection of both tables. Use inner join when you only want records that exist in both tables.
(set trades (table [symbol order_id price quantity]
(list ['AAPL 'MSFT 'GOOG] [1001 1002 1003]
[150.25 300.50 125.75] [100 200 150])))
(set settlements (table [order_id settlement_date fee]
(list [1001 1002 1004] [2024.01.15 2024.01.15 2024.01.16] [0.50 1.00 0.75])))
;; Inner join: only trades that have been settled
(inner-join [order_id] trades settlements)
┌──────────┬────────┬────────┬──────────┬─────────────────┬──────┐
│ order_id │ symbol │ price │ quantity │ settlement_date │ fee │
├──────────┼────────┼────────┼──────────┼─────────────────┼──────┤
│ 1001 │ AAPL │ 150.25 │ 100 │ 2024.01.15 │ 0.50 │
│ 1002 │ MSFT │ 300.50 │ 200 │ 2024.01.15 │ 1.00 │
├──────────┴────────┴────────┴──────────┴─────────────────┴──────┤
│ 2 rows (2 shown) 6 columns (6 shown) │
└────────────────────────────────────────────────────────────────┘
Note that GOOG (order_id 1003) is excluded because it has no matching settlement record.
The inner-join function takes three arguments:
1. A Vector of Symbols representing join columns.
2. The left Table
3. The right Table
Asof Join¶
Matches rows based on equality for all join columns except the last one, and finds the greatest value less than or equal to the left table's value for the last column. Ideal for time-series data where you want the most recent value up to a point in time.
(set n 10)
(set tsym (take n (concat (take 99 'AAPL) (take 1 'MSFT))))
(set ttime (+ 09:00:00 (as 'Time (/ (* (til n) 3) 10))))
(set price (+ 10 (til n)))
(set bsym (take (* 2 n) (concat (concat (take 3 'AAPL) (take 2 'MSFT)) (take 1 'GOOG))))
(set btime (+ 09:00:00 (as 'Time (/ (* (til (* 2 n)) 2) 10))))
(set bid (+ 8 (/ (til (* 2 n)) 2)))
(set ask (+ 12 (/ (til (* 2 n)) 2)))
(set trades (table [Sym Ts Price] (list tsym ttime price)))
(set quotes (table [Sym Ts Bid Ask] (list bsym btime bid ask)))
(asof-join [Sym Ts] trades quotes)
┌──────┬──────────────┬───────┬─────┬─────┐
│ Sym │ Ts │ Price │ Bid │ Ask │
├──────┼──────────────┼───────┼─────┼─────┤
│ AAPL │ 09:00:00.000 │ 10 │ 9 │ 13 │
│ AAPL │ 09:00:00.000 │ 11 │ 9 │ 13 │
│ AAPL │ 09:00:00.000 │ 12 │ 9 │ 13 │
│ AAPL │ 09:00:00.000 │ 13 │ 9 │ 13 │
│ AAPL │ 09:00:00.001 │ 14 │ 12 │ 16 │
│ AAPL │ 09:00:00.001 │ 15 │ 12 │ 16 │
│ AAPL │ 09:00:00.001 │ 16 │ 12 │ 16 │
│ AAPL │ 09:00:00.002 │ 17 │ 15 │ 19 │
│ AAPL │ 09:00:00.002 │ 18 │ 15 │ 19 │
│ AAPL │ 09:00:00.002 │ 19 │ 15 │ 19 │
├──────┴──────────────┴───────┴─────┴─────┤
│ 10 rows (10 shown) 5 columns (5 shown) │
└─────────────────────────────────────────┘
The asof-join function takes three arguments:
a Vector of Symbols - the last column is used for "as of" matching (typically a time/timestamp column), the left Table, the right Table
For each row in the left table, it finds the row in the right table where:
- All join columns except the last match exactly
- The last join column in the right table is ≤ the last join column in the left table
- And it's the greatest such value
Window Join¶
The window-join and window-join1 functions are designed for time-series data. They join tables based on equality for all join columns except the last one, and use time windows for the last column. Perfect for aggregating data within time ranges.
(set n 100000)
(set tsym (take n (concat (take 99 'AAPL) (take 1 'MSFT))))
(set ttime (+ 09:00:00 (as 'Time (/ (* (til n) 3) 10))))
(set price (+ 10 (til n)))
(set bsym (take (* 2 n) (concat (concat (take 3 'AAPL) (take 2 'MSFT)) (take 1 'GOOG))))
(set btime (+ 09:00:00 (as 'Time (/ (* (til (* 2 n)) 2) 10))))
(set bid (+ 8 (/ (til (* 2 n)) 2)))
(set ask (+ 12 (/ (til (* 2 n)) 2)))
(set trades (table [Sym Ts Price] (list tsym ttime price)))
(set quotes (table [Sym Ts Bid Ask] (list bsym btime bid ask)))
;; Create intervals: ±1000 milliseconds around each trade timestamp
(set intervals (map-left + [-1000 1000] (at trades 'Ts)))
(
[08:59:59.000 08:59:59.000 08:59:59.000 08:59:59.000 08:59:59.001 08:59:59.001 08:59:59.001..]
[09:00:01.000 09:00:01.000 09:00:01.000 09:00:01.000 09:00:01.001 09:00:01.001 09:00:01.001..]
)
(window-join [Sym Ts] intervals trades quotes { ;; Excludes interval bounds
bid: (min Bid)
ask: (max Ask)})
┌──────┬──────────────┬────────┬───────┬───────┐
│ Sym │ Ts │ Price │ bid │ ask │
├──────┼──────────────┼────────┼───────┼───────┤
│ AAPL │ 09:00:00.000 │ 10 │ 8 │ 2514 │
│ AAPL │ 09:00:00.000 │ 11 │ 8 │ 2514 │
│ AAPL │ 09:00:00.000 │ 12 │ 8 │ 2514 │
│ AAPL │ 09:00:00.000 │ 13 │ 8 │ 2514 │
│ AAPL │ 09:00:00.001 │ 14 │ 8 │ 2515 │
│ AAPL │ 09:00:00.001 │ 15 │ 8 │ 2515 │
│ AAPL │ 09:00:00.001 │ 16 │ 8 │ 2515 │
│ AAPL │ 09:00:00.002 │ 17 │ 8 │ 2518 │
│ AAPL │ 09:00:00.002 │ 18 │ 8 │ 2518 │
│ AAPL │ 09:00:00.002 │ 19 │ 8 │ 2518 │
┆ … ┆ … ┆ … ┆ … ┆ … ┆
│ AAPL │ 09:00:29.997 │ 100000 │ 72501 │ 77506 │
│ AAPL │ 09:00:29.997 │ 100001 │ 72501 │ 77506 │
│ AAPL │ 09:00:29.997 │ 100002 │ 72501 │ 77506 │
│ AAPL │ 09:00:29.997 │ 100003 │ 72501 │ 77506 │
│ AAPL │ 09:00:29.998 │ 100004 │ 72504 │ 77509 │
│ AAPL │ 09:00:29.998 │ 100005 │ 72504 │ 77509 │
│ AAPL │ 09:00:29.998 │ 100006 │ 72504 │ 77509 │
│ AAPL │ 09:00:29.999 │ 100007 │ 72507 │ 77511 │
│ AAPL │ 09:00:29.999 │ 100008 │ 72507 │ 77511 │
│ MSFT │ 09:00:29.999 │ 100009 │ 72507 │ 77510 │
├──────┴──────────────┴────────┴───────┴───────┤
│ 100000 rows (20 shown) 5 columns (5 shown) │
└──────────────────────────────────────────────┘
(window-join1 [Sym Ts] intervals trades quotes { ;; Window join1: same but includes interval bounds
bid: (min Bid)
ask: (max Ask)})
Both functions take five arguments:
columns: A Vector of Symbols - the last column is used for window matchingintervals: A List of two Vectors representing time windows[lower-bounds upper-bounds]left-table: The left Tableright-table: The right Tableaggregations: A Dictionary of aggregations
Creating Intervals¶
Intervals are created using map-left to add time offsets to each timestamp:
;; Create intervals of ±1000 milliseconds around each timestamp
(set intervals (map-left + [-1000 1000] (at trades 'Ts)))
This creates a list of two vectors:
- First vector: lower bounds (timestamp - 1000ms)
- Second vector: upper bounds (timestamp + 1000ms)
The intervals define time windows around each timestamp. For each trade, the window join finds all quotes within that time window and applies the specified aggregations.
Difference between window-join and window-join1
window-join: Excludes interval bounds from aggregation (open interval)window-join1: Includes interval bounds in aggregation (closed interval)
Use window-join1 when you want to include data exactly at the boundary times.