Select Query¶

The select function performs data selection, filtering, and aggregation on Tables. select operations are automatically optimized and parallelize aggregations when possible. Grouping and filtering are highly efficient on columnar data.

(set employees (table [name dept salary hire_date] 
  (list 
    (list "Alice" "Bob" "Charlie" "David") 
    ['IT 'HR 'IT 'IT] 
    [75000 65000 85000 72000] 
    [2021.01.15 2020.03.20 2019.11.30 2022.05.10])))

;; Select employees with salary > 70000, grouped by department
(select {
  avg_salary: (avg salary)
  headcount: (count name)
  from: employees
  where: (> salary 70000)
  by: dept})
┌──────┬────────────┬───────────┐
│ dept │ avg_salary │ headcount │
├──────┼────────────┼───────────┤
│ IT   │ 77333.33   │ 3         │
├──────┴────────────┴───────────┤
│ 1 rows (1 shown) 3 columns    │
└───────────────────────────────┘

All select parameters are provided as a single Dictionary. The from clause is required.

Column Name Conflicts

If a column name matches a built-in function or environment variable, use (at table 'column) to access it:

(select {
  name: (at employees 'name)
  from: employees})

Filtering with `where`¶

Use the where clause to filter rows based on conditions. The where clause accepts any boolean expression that evaluates to a Vector of Booleans or a single boolean value.

;; Select employees with salary greater than 70000
(select {
  name: name 
  salary: salary 
  from: employees 
  where: (> salary 70000)})
┌─────────┬────────┐
│ name    │ salary │
├─────────┼────────┤
│ Alice   │ 75000  │
│ Charlie │ 85000  │
│ David   │ 72000  │
└─────────┴────────┘

You can use complex conditions with logical operators:

;; Select IT employees with salary between 70000 and 80000
(select {
  name: name 
  salary: salary 
  from: employees 
  where: (and (= dept 'IT) (>= salary 70000) (<= salary 80000))})

Aggregation¶

Use Aggregations to compute columns:

(select {
  total_salary: (sum salary) 
  avg_salary: (avg salary) 
  headcount: (count name) 
  from: employees})
┌──────────────┬────────────┬───────────┐
│ total_salary │ avg_salary │ headcount │
├──────────────┼────────────┼───────────┤
│ 297000       │ 74250.00   │ 4         │
└──────────────┴────────────┴───────────┘

Grouping with `by`¶

Group rows using the by keyword to perform aggregations per group. The by clause accepts a Symbol or a Dictionary of symbols.

Single Column Grouping¶

;; Group by department and calculate statistics
(select {
  avg_salary: (avg salary) 
  headcount: (count name) 
  earliest_hire: (min hire_date) 
  from: employees 
  by: dept})
┌──────┬────────────┬───────────┬───────────────┐
│ dept │ avg_salary │ headcount │ earliest_hire │
├──────┼────────────┼───────────┼───────────────┤
│ IT   │ 77333.33   │ 3         │ 2019.11.30    │
│ HR   │ 65000.00   │ 1         │ 2020.03.20    │
└──────┴────────────┴───────────┴───────────────┘

Multiple Column Grouping¶

Group by multiple columns using a Dictionary:

;; Group by both department and region
(select {
  dept: dept 
  region: region 
  avg_salary: (avg salary) 
  from: employees 
  by: {dept: dept region: region}})

Select Query¶

Filtering with where¶

Aggregation¶

Grouping with by¶

Single Column Grouping¶

Multiple Column Grouping¶

Filtering with `where`¶

Grouping with `by`¶