105 lines
3.4 KiB
Markdown
105 lines
3.4 KiB
Markdown
---
|
|
title: SQL
|
|
created_date: 2024-10-28
|
|
updated_date: 2024-10-28
|
|
aliases:
|
|
tags:
|
|
---
|
|
# SQL
|
|
|
|
SQL stands for structured query language and is used to retrieve entries from a [[Database]].
|
|
Many different implementations exist such as MySQL, SQLite
|
|
|
|
## Crash Course Takeaways
|
|
```SQL
|
|
SELECT (DISTINCT) [Column names, or * for everything]
|
|
FROM table_name
|
|
WHERE selection_statements;
|
|
ORDER BY column_name (DESC)
|
|
-- only return 20 entries
|
|
LIMIT 20
|
|
```
|
|
|
|
|
|
> [!Info] DISTINCT statement
|
|
> This will make sure that only unique entries are returned. It is used to filter out duplicates, which is a very important step when doing statistics on a dataset. Be cautious though, depending on the SELECT statement it might filter out actual datapoints instead of duplicates (e.g. SELECT DISTINCT last_name ... would only return one of several siblings)
|
|
|
|
### Filter
|
|
- Use selection statement after the **WHERE** keyword
|
|
- e.g. `WHERE last_name="Connor"` or `WHERE age <= 18` or `WHERE nationality in ("Swiss", "Italian", "French")`
|
|
- `AND` , `OR`, `NOT` are logical operators
|
|
- `WHERE name Like "%blue%"` returns any string with blue in it `(%` are placeholders)
|
|
- `WHERE name Like "____"` returns all names with 4 characters
|
|
- `WHERE name IS (NOT) NULL`: filter out nulls
|
|
### Sort
|
|
You can sort with the Keyword `ORDER BY`. If you want to reverse the order you can add the `DESC` statement at the end.
|
|
|
|
### CASE Statement
|
|
The `CASE` statement allows to change data on the fly (e.g. grouping) without changing the underlying database entries.
|
|
```SQL
|
|
SELECT EmployeeName,
|
|
CASE
|
|
WHEN EmpLevel = 1 THEN 'Data Analyst'
|
|
WHEN EmpLevel = 2 THEN 'Middle Manager'
|
|
WHEN EmpLevel = 3 THEN 'Senior Executive'
|
|
ELSE 'Unemployed'
|
|
END
|
|
|
|
FROM Employees;
|
|
```
|
|
|
|
### Limit Keyword
|
|
With the `LIMIT` keyword you can limit the number of rows the query returns.
|
|
|
|
### Functions
|
|
- `COUNT` returns the number of entries of a query: `SELECT COUNT(*) FROM names;`
|
|
- `SUM`: sums all entries of the query
|
|
- `MIN`
|
|
- `MAX`
|
|
- `AVG`
|
|
-
|
|
Functions can be executed with subgroups of the returned DB-entries. In order to achieve this the `GROUP BY column_name` statement is used (just as in the [[Pandas]] library).
|
|
|
|
```SQL
|
|
--- The example counts players from the same team
|
|
SELECT Team, COUNT(PlayerID)
|
|
FROM Players
|
|
GROUP BY TEAM;
|
|
```
|
|
|
|
If you want to filter a second time within the subgroups of the `GROUP BY` statement you can use the `HAVING` keyword.
|
|
```SQL
|
|
-- This only uses Account Types with more than 100 Accounts in the uppermost count aggregation
|
|
SELECT AccountType, COUNT(AccountID)
|
|
FROM Accounts
|
|
GROUP BY AccountType
|
|
HAVING COUNT(AccountID) > 100;
|
|
```
|
|
#### COALESCE to default nulls
|
|
`COALESCE`: replace nulls with a default value
|
|
```SQL
|
|
SELECT AVG(COALESCE(HolidaysTaken, 0))
|
|
FROM AnnualLeave;
|
|
|
|
-- or select on of the 3 numbers as a number
|
|
SELECT
|
|
CustomerName,
|
|
COALESCE(HomePhone, MobilePhone, BusinessPhone) as PhoneNumber
|
|
FROM Customers;
|
|
```
|
|
|
|
### Joins
|
|
#### Inner Join (aka. join)
|
|
Joins columns of two tables, if and only if, all datapoints exist in both tables.
|
|
#### Left Join (aka. left outer join)
|
|
This join includes all rows of the first table even if there is no match in the second table.
|
|
|
|
### INSERT INTO
|
|
is used to insert rows into database tables
|
|
### UPDATE
|
|
is used to change an existing database entry
|
|
### DELETE
|
|
is used to delete all entries that the queries returns: careful, this is dangerous!
|
|
|
|
## Examples
|
|
- The queries used in Obsidians Dataview are quite similar. |