Advanced SQL

Aggregate Functions

Aggregation in SQL are a standard and very useful extension of FOL.

Aggregate (column) functions are introduced to:

  1. Count the number of tuples in a relation;
  2. Sum values of a numeric attribute over a relation; and
  3. Find minimum or maximum values of a numeric attribute over a relation.

They can apply to groups of tuples that have equal values for selected attributes.

The SQL COUNT(), AVG() and SUM() Functions are super useful.

However, be super careful, you probably need to do a GROUP BY afterwards.

The general syntax is the following:

SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s)
LIMIT X HAVING X;

More covered below.

Group By

Order of operations:

  1. Run <FROM-WHERE>
  2. Run GROUP BY to partition the results of <FROM-WHERE>
  3. On on each of these partitioned groups, apply the aggregate functions.
  4. For each group, add a tuple with the grouping attribute values and the results of the aggregate functions to the result.

https://www.w3schools.com/sql/sql_groupby.asp

SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;
COUNT(CustomerID)Country
3Argentina
2Austria
2Belgium
9Brazil
3Canada
2Denmark
2Finland

Without group by, you would have this

SELECT COUNT(CustomerID), Country
FROM Customers
COUNT(CustomerID)Country
91Germany

So it seems to just give a country?

Danger

When you use aggregate functions like COUNT(), SUM(), AVG(), etc., along with other columns that are not part of the aggregate function, you need to use the GROUP BY clause.

Here’s an example where GROUP BY doesn’t work:

  • For each author, a count of the number of article pages.
select author, sum(endpage-startpage+1) as pcnt \
from wrote, article \
where publication = pubid \
group by author
	AUTHOR         PCNT
----------- -----------
		  1          12

1 record(s) selected.

Not quite correct: Author 2 should be reported as having 0 pages. This is because of the way the cross join works, see SQL.

Having

https://www.w3schools.com/sql/sql_having.asp

The HAVING clause was added to SQL because the WHERE keyword cannot be used with aggregate functions.

SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s)
LIMIT X HAVING X;

The following SQL statement lists the number of customers in each country. Only include countries with more than 5 customers:

SELECT COUNT(CustomerID), Country  
FROM Customers  
GROUP BY Country  
HAVING COUNT(CustomerID) > 5;

All publications with more than one author.

select publication, count(author) as acnt \
from wrote \
group by publication \
having count(author) > 1
PUBLICATION ACNT
----------- -----------
		   2          2

1 record(s) selected.

SQL Order By

Syntax:

ORDER BY e1 [Dir1], . . . , ek [Dirk]

where `Diri is either ASC or DESC.

  • ASC is the default

Warning

If you use ORDER BY, make sure that the column name is correct first. Ideally, you should rename it to prevent errors. Using the aggregate function name won’t work, because it will just be relabeled to a column number.

For example,

select author, MAX(publication) as maximum \
from wrote \
group by author \
order by maximum DESC
SELECT * FROM Customers  
ORDER BY Country DESC;

You can also order by several columns

SELECT * FROM Customers  
ORDER BY Country, CustomerName;

SQL Limit

The number of results of a query can be limited by appending a LIMIT clause to a query. You can also use the OFFSET keyword to skip the first e2 answers.

<query> LIMIT e1 [ OFFSET e2 ]

List at most the first two entries of a sorted list of authors by name.

SELECT DISTINCT * FROM author \
ORDER BY name \
LIMIT 2

List at most the next two entries following the second entry of a sorted list of authors by name.

SELECT DISTINCT * FROM author \
ORDER BY name \
LIMIT 2 OFFSET 2

SQL Join

See SQL Joins. Extension of syntax for the FROM clause:

FROM T1 <j-type> JOIN T2 ON <cond>
  • where <j-type> is one of FULL, LEFT, RIGHT, or INNER

Example

select aid, publication \
from author left join wrote on aid=author

Counting with outer join: Author ids and a count of the number of publications for each.

select aid, count(publication) as pcnt \
from author left join wrote on aid = author \
group by aid

SQL Multisets

The Answers to SELECT blocks are actually multisets of tuples (remember, this is how we can apply aggregate functions). Therefore, we can actually apply multiset operations on them.

  • Multiset union: UNION ALL .
    • behaves like “∨” in range restricted RC with multiset semantics
  • Multiset difference: EXCEPT ALL .
    • behaves like “∧¬” in range restricted RC with multiset semantics
  • Multiset intersection: INTERSECT ALL .
    • maximum number of tuples common to and
    • rarely used