When working with databases and doing general data investigation the GROUP BY clause is going to become a part of your regular vocabulary. It’s very useful when writing SQL – whether it’s for reporting or trouble-shooting data issues and It’s especially useful for finding duplicates.
Note: When writing a SQL statement with a GROUP BY clause, you cannot or should not select more columns than you are grouping by.
Some databases allow you to select more columns than selected in a GROUP BY such as the DBeaver SQLite sample database, but if you don’t know the data inside and out this is a recipe for disaster. Usually the SQL editor will display an error from your database as you’ve selected more columns that you are grouping by.
Let’s give an example of using a GROUP BY. Let’s say that we want to know how many employees we have at their respective title within our organization. We could write a query like this:
SELECT TITLE, COUNT(*)
FROM EMPLOYEE E
— WHERE TITLE = ‘IT Staff’ — Example to show that a WHERE clause must come before GROUP BY
GROUP BY TITLE;
Looking At Practical Examples Of GROUP BY
We can also rename the COUNT(*) to something more relevant:
Let’s say instead of getting an employee count, we want to get the invoice totals by state. We could use the GROUP BY with a SUM:
The number of aggregate functions can vary based on the database, but other aggregate functions that can be used with a SQL GROUP BY include:
- AVG()
- MAX()
- MIN()
These sorts of functions can answer a wide variety of questions such as:
- Which company is bringing us the most income?
- Where are most of our companies located?
- What is the average amount invoiced per customer?
- How many employees live outside of the U.S.?
This list of questions is never-ending, but hopefully you can see how these sorts of questions can be very useful. These types of data questions can offer insights to a company. Similarly, insights to a business are also referred to as Business Intelligence.