Files

2024-12-02 15:11:30 +01:00

3.4 KiB

Raw Blame History

title, created_date, updated_date, aliases, tags

title	created_date	updated_date	aliases	tags
SQL	2024-10-28	2024-10-28

SQL

SQL stands for structured query language and is used to retrieve entries from a Database. Many different implementations exist such as MySQL, SQLite

Crash Course Takeaways

SELECT (DISTINCT) [Column names, or * for everything]
FROM table_name
WHERE selection_statements;
ORDER BY column_name (DESC)
-- only return 20 entries
LIMIT 20

[!Info] DISTINCT statement This will make sure that only unique entries are returned. It is used to filter out duplicates, which is a very important step when doing statistics on a dataset. Be cautious though, depending on the SELECT statement it might filter out actual datapoints instead of duplicates (e.g. SELECT DISTINCT last_name ... would only return one of several siblings)

Filter

Use selection statement after the WHERE keyword
e.g. WHERE last_name="Connor" or WHERE age <= 18 or WHERE nationality in ("Swiss", "Italian", "French")
AND , OR, NOT are logical operators
WHERE name Like "%blue%" returns any string with blue in it (% are placeholders)
WHERE name Like "____" returns all names with 4 characters
WHERE name IS (NOT) NULL: filter out nulls

Sort

You can sort with the Keyword ORDER BY. If you want to reverse the order you can add the DESC statement at the end.

CASE Statement

The CASE statement allows to change data on the fly (e.g. grouping) without changing the underlying database entries.

SELECT EmployeeName,  
CASE  
	WHEN EmpLevel = 1 THEN 'Data Analyst'  
	WHEN EmpLevel = 2 THEN 'Middle Manager'  
	WHEN EmpLevel = 3 THEN 'Senior Executive'  
ELSE 'Unemployed'  
END  
  
FROM Employees;

Limit Keyword

With the LIMIT keyword you can limit the number of rows the query returns.

Functions

COUNT returns the number of entries of a query: SELECT COUNT(*) FROM names;
SUM: sums all entries of the query
MIN
MAX
AVG

Functions can be executed with subgroups of the returned DB-entries. In order to achieve this the GROUP BY column_name statement is used (just as in the Pandas library).

--- The example counts players from the same team
SELECT Team, COUNT(PlayerID)  
FROM Players  
GROUP BY TEAM;

If you want to filter a second time within the subgroups of the GROUP BY statement you can use the HAVING keyword.

-- This only uses Account Types with more than 100 Accounts in the uppermost count aggregation 
SELECT AccountType, COUNT(AccountID)  
FROM Accounts  
GROUP BY AccountType  
HAVING COUNT(AccountID) > 100;

COALESCE to default nulls

COALESCE: replace nulls with a default value

SELECT AVG(COALESCE(HolidaysTaken, 0))
FROM AnnualLeave;

-- or select on of the 3 numbers as a number
SELECT  
	CustomerName,  
	COALESCE(HomePhone, MobilePhone, BusinessPhone) as PhoneNumber  
FROM Customers;

Joins

Inner Join (aka. join)

Joins columns of two tables, if and only if, all datapoints exist in both tables.

Left Join (aka. left outer join)

This join includes all rows of the first table even if there is no match in the second table.

INSERT INTO

is used to insert rows into database tables

UPDATE

is used to change an existing database entry

DELETE

is used to delete all entries that the queries returns: careful, this is dangerous!

Examples

The queries used in Obsidians Dataview are quite similar.

3.4 KiB Raw Blame History

SQL