--- title: SQL created_date: 2024-10-28 updated_date: 2024-10-28 aliases: tags: --- # SQL SQL stands for structured query language and is used to retrieve entries from a [[Database]]. Many different implementations exist such as MySQL, SQLite ## Crash Course Takeaways ```SQL SELECT (DISTINCT) [Column names, or * for everything] FROM table_name WHERE selection_statements; ORDER BY column_name (DESC) -- only return 20 entries LIMIT 20 ``` > [!Info] DISTINCT statement > This will make sure that only unique entries are returned. It is used to filter out duplicates, which is a very important step when doing statistics on a dataset. Be cautious though, depending on the SELECT statement it might filter out actual datapoints instead of duplicates (e.g. SELECT DISTINCT last_name ... would only return one of several siblings) ### Filter - Use selection statement after the **WHERE** keyword - e.g. `WHERE last_name="Connor"` or `WHERE age <= 18` or `WHERE nationality in ("Swiss", "Italian", "French")` - `AND` , `OR`, `NOT` are logical operators - `WHERE name Like "%blue%"` returns any string with blue in it `(%` are placeholders) - `WHERE name Like "____"` returns all names with 4 characters - `WHERE name IS (NOT) NULL`: filter out nulls ### Sort You can sort with the Keyword `ORDER BY`. If you want to reverse the order you can add the `DESC` statement at the end. ### CASE Statement The `CASE` statement allows to change data on the fly (e.g. grouping) without changing the underlying database entries. ```SQL SELECT EmployeeName, CASE WHEN EmpLevel = 1 THEN 'Data Analyst' WHEN EmpLevel = 2 THEN 'Middle Manager' WHEN EmpLevel = 3 THEN 'Senior Executive' ELSE 'Unemployed' END FROM Employees; ``` ### Limit Keyword With the `LIMIT` keyword you can limit the number of rows the query returns. ### Functions - `COUNT` returns the number of entries of a query: `SELECT COUNT(*) FROM names;` - `SUM`: sums all entries of the query - `MIN` - `MAX` - `AVG` - Functions can be executed with subgroups of the returned DB-entries. In order to achieve this the `GROUP BY column_name` statement is used (just as in the [[Pandas]] library). ```SQL --- The example counts players from the same team SELECT Team, COUNT(PlayerID) FROM Players GROUP BY TEAM; ``` If you want to filter a second time within the subgroups of the `GROUP BY` statement you can use the `HAVING` keyword. ```SQL -- This only uses Account Types with more than 100 Accounts in the uppermost count aggregation SELECT AccountType, COUNT(AccountID) FROM Accounts GROUP BY AccountType HAVING COUNT(AccountID) > 100; ``` #### COALESCE to default nulls `COALESCE`: replace nulls with a default value ```SQL SELECT AVG(COALESCE(HolidaysTaken, 0)) FROM AnnualLeave; -- or select on of the 3 numbers as a number SELECT CustomerName, COALESCE(HomePhone, MobilePhone, BusinessPhone) as PhoneNumber FROM Customers; ``` ### Joins #### Inner Join (aka. join) Joins columns of two tables, if and only if, all datapoints exist in both tables. #### Left Join (aka. left outer join) This join includes all rows of the first table even if there is no match in the second table. ### INSERT INTO is used to insert rows into database tables ### UPDATE is used to change an existing database entry ### DELETE is used to delete all entries that the queries returns: careful, this is dangerous! ## Examples - The queries used in Obsidians Dataview are quite similar.