Files
Main/99 Work/0 OneSec/OneSecNotes/30 Engineering Skills/Computer Science/SQL.md
2024-12-02 15:11:30 +01:00

105 lines
3.4 KiB
Markdown

---
title: SQL
created_date: 2024-10-28
updated_date: 2024-10-28
aliases:
tags:
---
# SQL
SQL stands for structured query language and is used to retrieve entries from a [[Database]].
Many different implementations exist such as MySQL, SQLite
## Crash Course Takeaways
```SQL
SELECT (DISTINCT) [Column names, or * for everything]
FROM table_name
WHERE selection_statements;
ORDER BY column_name (DESC)
-- only return 20 entries
LIMIT 20
```
> [!Info] DISTINCT statement
> This will make sure that only unique entries are returned. It is used to filter out duplicates, which is a very important step when doing statistics on a dataset. Be cautious though, depending on the SELECT statement it might filter out actual datapoints instead of duplicates (e.g. SELECT DISTINCT last_name ... would only return one of several siblings)
### Filter
- Use selection statement after the **WHERE** keyword
- e.g. `WHERE last_name="Connor"` or `WHERE age <= 18` or `WHERE nationality in ("Swiss", "Italian", "French")`
- `AND` , `OR`, `NOT` are logical operators
- `WHERE name Like "%blue%"` returns any string with blue in it `(%` are placeholders)
- `WHERE name Like "____"` returns all names with 4 characters
- `WHERE name IS (NOT) NULL`: filter out nulls
### Sort
You can sort with the Keyword `ORDER BY`. If you want to reverse the order you can add the `DESC` statement at the end.
### CASE Statement
The `CASE` statement allows to change data on the fly (e.g. grouping) without changing the underlying database entries.
```SQL
SELECT EmployeeName,
CASE
WHEN EmpLevel = 1 THEN 'Data Analyst'
WHEN EmpLevel = 2 THEN 'Middle Manager'
WHEN EmpLevel = 3 THEN 'Senior Executive'
ELSE 'Unemployed'
END
FROM Employees;
```
### Limit Keyword
With the `LIMIT` keyword you can limit the number of rows the query returns.
### Functions
- `COUNT` returns the number of entries of a query: `SELECT COUNT(*) FROM names;`
- `SUM`: sums all entries of the query
- `MIN`
- `MAX`
- `AVG`
-
Functions can be executed with subgroups of the returned DB-entries. In order to achieve this the `GROUP BY column_name` statement is used (just as in the [[Pandas]] library).
```SQL
--- The example counts players from the same team
SELECT Team, COUNT(PlayerID)
FROM Players
GROUP BY TEAM;
```
If you want to filter a second time within the subgroups of the `GROUP BY` statement you can use the `HAVING` keyword.
```SQL
-- This only uses Account Types with more than 100 Accounts in the uppermost count aggregation
SELECT AccountType, COUNT(AccountID)
FROM Accounts
GROUP BY AccountType
HAVING COUNT(AccountID) > 100;
```
#### COALESCE to default nulls
`COALESCE`: replace nulls with a default value
```SQL
SELECT AVG(COALESCE(HolidaysTaken, 0))
FROM AnnualLeave;
-- or select on of the 3 numbers as a number
SELECT
CustomerName,
COALESCE(HomePhone, MobilePhone, BusinessPhone) as PhoneNumber
FROM Customers;
```
### Joins
#### Inner Join (aka. join)
Joins columns of two tables, if and only if, all datapoints exist in both tables.
#### Left Join (aka. left outer join)
This join includes all rows of the first table even if there is no match in the second table.
### INSERT INTO
is used to insert rows into database tables
### UPDATE
is used to change an existing database entry
### DELETE
is used to delete all entries that the queries returns: careful, this is dangerous!
## Examples
- The queries used in Obsidians Dataview are quite similar.