Advanced SQL
Aggregate Functions
Aggregation in SQL are a standard and very useful extension of FOL.
Aggregate (column) functions are introduced to:
- Count the number of tuples in a relation;
- Sum values of a numeric attribute over a relation; and
- Find minimum or maximum values of a numeric attribute over a relation.
They can apply to groups of tuples that have equal values for selected attributes.
The SQL COUNT()
, AVG()
and SUM()
Functions are super useful.
However, be super careful, you probably need to do a GROUP BY
afterwards.
The general syntax is the following:
More covered below.
Group By
Order of operations:
- Run
<FROM-WHERE>
- Run
GROUP BY
to partition the results of<FROM-WHERE>
- On on each of these partitioned groups, apply the aggregate functions.
- For each group, add a tuple with the grouping attribute values and the results of the aggregate functions to the result.
https://www.w3schools.com/sql/sql_groupby.asp
COUNT(CustomerID) | Country |
---|---|
3 | Argentina |
2 | Austria |
2 | Belgium |
9 | Brazil |
3 | Canada |
2 | Denmark |
2 | Finland |
Without group by, you would have this
COUNT(CustomerID) | Country |
---|---|
91 | Germany |
So it seems to just give a country?
Danger
When you use aggregate functions like
COUNT()
,SUM()
,AVG()
, etc., along with other columns that are not part of the aggregate function, you need to use theGROUP BY
clause.
Here’s an example where GROUP BY
doesn’t work:
- For each author, a count of the number of article pages.
AUTHOR PCNT
----------- -----------
1 12
1 record(s) selected.
Not quite correct: Author 2 should be reported as having 0 pages. This is because of the way the cross join works, see SQL.
Having
https://www.w3schools.com/sql/sql_having.asp
The HAVING
clause was added to SQL because the WHERE
keyword cannot be used with aggregate functions.
The following SQL statement lists the number of customers in each country. Only include countries with more than 5 customers:
All publications with more than one author.
PUBLICATION ACNT
----------- -----------
2 2
1 record(s) selected.
SQL Order By
Syntax:
where `Diri is either ASC or DESC.
ASC
is the default
Warning
If you use
ORDER BY
, make sure that the column name is correct first. Ideally, you should rename it to prevent errors. Using the aggregate function name won’t work, because it will just be relabeled to a column number.
For example,
You can also order by several columns
SQL Limit
The number of results of a query can be limited by appending a LIMIT
clause to a query. You can also use the OFFSET
keyword to skip the first e2
answers.
List at most the first two entries of a sorted list of authors by name.
List at most the next two entries following the second entry of a sorted list of authors by name.
SQL Join
See SQL Joins.
Extension of syntax for the FROM
clause:
- where
<j-type>
is one ofFULL
,LEFT
,RIGHT
, orINNER
Example
Counting with outer join: Author ids and a count of the number of publications for each.
SQL Multisets
The Answers to SELECT blocks are actually multisets of tuples (remember, this is how we can apply aggregate functions). Therefore, we can actually apply multiset operations on them.
- Multiset union:
UNION ALL
.- behaves like “∨” in range restricted RC with multiset semantics
- Multiset difference:
EXCEPT ALL
.- behaves like “∧¬” in range restricted RC with multiset semantics
- Multiset intersection:
INTERSECT ALL
.- maximum number of tuples common to and
- rarely used