Skip to content

Select Query

The select function performs data selection, filtering, and aggregation on Tables. select operations are automatically optimized and parallelize aggregations when possible. Grouping and filtering are highly efficient on columnar data.

(set employees (table [name dept salary hire_date] 
  (list 
    (list "Alice" "Bob" "Charlie" "David") 
    ['IT 'HR 'IT 'IT] 
    [75000 65000 85000 72000] 
    [2021.01.15 2020.03.20 2019.11.30 2022.05.10])))

;; Select employees with salary > 70000, grouped by department
(select {
  avg_salary: (avg salary)
  headcount: (count name)
  from: employees
  where: (> salary 70000)
  by: dept})
┌──────┬────────────┬───────────┐
 dept  avg_salary  headcount 
├──────┼────────────┼───────────┤
 IT    77333.33    3         
├──────┴────────────┴───────────┤
 1 rows (1 shown) 3 columns    
└───────────────────────────────┘

All select parameters are provided as a single Dictionary. The from clause is required.

Column Name Conflicts

If a column name matches a built-in function or environment variable, use (at table 'column) to access it:

(select {
  name: (at employees 'name)
  from: employees})

Filtering with where

Use the where clause to filter rows based on conditions. The where clause accepts any boolean expression that evaluates to a Vector of Booleans or a single boolean value.

;; Select employees with salary greater than 70000
(select {
  name: name 
  salary: salary 
  from: employees 
  where: (> salary 70000)})
┌─────────┬────────┐
 name     salary 
├─────────┼────────┤
 Alice    75000  
 Charlie  85000  
 David    72000  
└─────────┴────────┘

You can use complex conditions with logical operators:

;; Select IT employees with salary between 70000 and 80000
(select {
  name: name 
  salary: salary 
  from: employees 
  where: (and (= dept 'IT) (>= salary 70000) (<= salary 80000))})

Aggregation

Use Aggregations to compute columns:

(select {
  total_salary: (sum salary) 
  avg_salary: (avg salary) 
  headcount: (count name) 
  from: employees})
┌──────────────┬────────────┬───────────┐
 total_salary  avg_salary  headcount 
├──────────────┼────────────┼───────────┤
 297000        74250.00    4         
└──────────────┴────────────┴───────────┘

Grouping with by

Group rows using the by keyword to perform aggregations per group. The by clause accepts a Symbol or a Dictionary of symbols.

Single Column Grouping

;; Group by department and calculate statistics
(select {
  avg_salary: (avg salary) 
  headcount: (count name) 
  earliest_hire: (min hire_date) 
  from: employees 
  by: dept})
┌──────┬────────────┬───────────┬───────────────┐
 dept  avg_salary  headcount  earliest_hire 
├──────┼────────────┼───────────┼───────────────┤
 IT    77333.33    3          2019.11.30    
 HR    65000.00    1          2020.03.20    
└──────┴────────────┴───────────┴───────────────┘

Multiple Column Grouping

Group by multiple columns using a Dictionary:

;; Group by both department and region
(select {
  dept: dept 
  region: region 
  avg_salary: (avg salary) 
  from: employees 
  by: {dept: dept region: region}})