---
title: SQL
created_date: 2024-10-28
updated_date: 2024-10-28
aliases:
tags:
---
# SQL

SQL stands for structured query language and is used to retrieve entries from a [[Database]]. 
Many different implementations exist such as MySQL, SQLite

## Crash Course Takeaways
```SQL
SELECT (DISTINCT) [Column names, or * for everything]
FROM table_name
WHERE selection_statements;
ORDER BY column_name (DESC)
-- only return 20 entries
LIMIT 20 
```


> [!Info] DISTINCT statement
> This will make sure that only unique entries are returned. It is used to filter out duplicates, which is a very important step when doing statistics on a dataset. Be cautious though, depending on the SELECT statement it might filter out actual datapoints instead of duplicates (e.g. SELECT DISTINCT last_name ... would only return one of several siblings)

### Filter
- Use selection statement after the **WHERE** keyword
- e.g. `WHERE last_name="Connor"` or `WHERE age <= 18` or `WHERE nationality in ("Swiss", "Italian", "French")`
- `AND` , `OR`, `NOT` are logical operators
- `WHERE name Like "%blue%"` returns any string with blue in it `(%` are placeholders)
- `WHERE name Like "____"` returns all names with 4 characters
- `WHERE name IS (NOT) NULL`: filter out nulls 
### Sort
You can sort with the Keyword `ORDER BY`. If you want to reverse the order you can add the `DESC` statement at the end.

### CASE Statement
The `CASE` statement allows to change data on the fly (e.g. grouping) without changing the underlying database entries.
```SQL
SELECT EmployeeName,  
CASE  
	WHEN EmpLevel = 1 THEN 'Data Analyst'  
	WHEN EmpLevel = 2 THEN 'Middle Manager'  
	WHEN EmpLevel = 3 THEN 'Senior Executive'  
ELSE 'Unemployed'  
END  
  
FROM Employees;
```

### Limit Keyword
With the `LIMIT` keyword you can limit the number of rows the query returns. 

### Functions
- `COUNT` returns the number of entries of a query: `SELECT COUNT(*) FROM names;`
- `SUM`: sums all entries of the query
- `MIN`
- `MAX`
- `AVG`
- 
Functions can be executed with subgroups of the returned DB-entries. In order to achieve this the `GROUP BY column_name` statement is used (just as in the [[Pandas]] library). 

```SQL
--- The example counts players from the same team
SELECT Team, COUNT(PlayerID)  
FROM Players  
GROUP BY TEAM;
```

If you want to filter a second time within the subgroups of the `GROUP BY` statement you can use the `HAVING` keyword.
```SQL
-- This only uses Account Types with more than 100 Accounts in the uppermost count aggregation 
SELECT AccountType, COUNT(AccountID)  
FROM Accounts  
GROUP BY AccountType  
HAVING COUNT(AccountID) > 100;
```
#### COALESCE to default nulls
`COALESCE`: replace nulls with a default value
```SQL
SELECT AVG(COALESCE(HolidaysTaken, 0))
FROM AnnualLeave;

-- or select on of the 3 numbers as a number
SELECT  
	CustomerName,  
	COALESCE(HomePhone, MobilePhone, BusinessPhone) as PhoneNumber  
FROM Customers;
```

### Joins
#### Inner Join (aka. join)
Joins columns of two tables, if and only if, all datapoints exist in both tables.
#### Left Join (aka. left outer join)
This join includes all rows of the first table even if there is no match in the second table.

### INSERT INTO
is used to insert rows into database tables
### UPDATE
is used to change an existing database entry
### DELETE
is used to delete all entries that the queries returns: careful, this is dangerous!

## Examples
- The queries used in Obsidians Dataview are quite similar.