GROUP BY
is an SQL statement used to group database records based on the column selected as the primary data point. In other words, you can use GROUP BY
to extract table data based on things like numerical sums, but also use together with IF
statements to execute complex queries.
Let’s create a sample table with some data to use as an example.
CREATE TABLE blog_authors (
id INTEGER,
username VARCHAR(128),
category VARCHAR(128),
posts INTEGER
);
The next step is to populate the table with some data.
INSERT INTO blog_authors VALUES
(1, 'Alex', 'sql', 23),
(2, 'John', 'nodejs', 76),
(3, 'Mark', 'javascript', 87),
(4, 'Alex', 'nodejs', 34),
(5, 'Mark', 'sql', 15),
(6, 'John', 'javascript', 98);
And lastly, we can do a query to see what the table looks like.
mysql> SELECT * FROM blog_authors;
+------+----------+------------+-------+
| id | username | category | posts |
+------+----------+------------+-------+
| 1 | Alex | sql | 23 |
| 2 | John | nodejs | 76 |
| 3 | Mark | javascript | 87 |
| 4 | Alex | nodejs | 34 |
| 5 | Mark | sql | 15 |
| 6 | John | javascript | 98 |
+------+----------+------------+-------+
6 rows in set (0.001 sec)
Now that we have a sample table with some data, we can move on to GROUP BY
examples.
GROUP BY on individual columns
You can use GROUP BY
to query and sort column data based on an individual column entry.
One of the most basic use cases for executing this query is to understand how many unique entries there are in a specific column.
mysql> SELECT username FROM blog_authors GROUP BY username;
+----------+
| username |
+----------+
| Alex |
| John |
| Mark |
+----------+
3 rows in set (0.001 sec)
We can also do this for the posts column.
mysql> SELECT posts FROM blog_authors GROUP BY posts;
+-------+
| posts |
+-------+
| 15 |
| 23 |
| 34 |
| 76 |
| 87 |
| 98 |
+-------+
6 rows in set (0.001 sec)
The next step is to try the same but query multiple columns at once.
GROUP BY on multiple columns
GROUP BY
can be used to fetch records from multiple columns that share the same value, which in our case would be username and category columns.
mysql> SELECT username, category FROM blog_authors GROUP BY username, category;
+----------+------------+
| username | category |
+----------+------------+
| Alex | nodejs |
| Alex | sql |
| John | javascript |
| John | nodejs |
| Mark | javascript |
| Mark | sql |
+----------+------------+
6 rows in set (0.001 sec)
The aforementioned examples are quite simple in nature and can be achieved through other queries, such as WHERE IS statements. But where GROUP BY
shines the most is the ability to sort through column data using specific calculations using COUNT, SUM, MIN, and MAX functions.
GROUP BY aggregating min, max, and total column value
In this example, let’s aggregate the total values of our posts column and group them by username.
mysql> SELECT username, COUNT(posts), SUM(posts), MIN(posts), MAX(posts) FROM blog_authors GROUP BY username;
+----------+--------------+------------+------------+------------+
| username | COUNT(posts) | SUM(posts) | MIN(posts) | MAX(posts) |
+----------+--------------+------------+------------+------------+
| Alex | 2 | 57 | 23 | 34 |
| John | 2 | 174 | 76 | 98 |
| Mark | 2 | 102 | 15 | 87 |
+----------+--------------+------------+------------+------------+
3 rows in set (0.000 sec)
And finally, let’s look at using HAVING
and WHERE
together with GROUP BY
.
GROUP BY using HAVING and WHERE statements
WHERE and HAVING statements are used to further narrow the data that you’re trying to GROUP BY in your query.
If you’re familiar with SQL you already know what both statements do, and if not:
- WHERE is used to specify a condition before grouping.
- HAVING is used to specify a condition after grouping.
Let’s look at two examples showcasing both statements.
WHERE
mysql> SELECT username, SUM(posts) FROM blog_authors WHERE username = 'Alex' GROUP BY username;
+----------+------------+
| username | SUM(posts) |
+----------+------------+
| Alex | 57 |
+----------+------------+
1 row in set (0.054 sec)
/* here we fetch the total number of posts and group it by username */
HAVING
mysql> SELECT username, SUM(posts) FROM blog_authors GROUP BY username HAVING SUM(posts) >=60;
+----------+------------+
| username | SUM(posts) |
+----------+------------+
| John | 174 |
| Mark | 102 |
+----------+------------+
2 rows in set (0.047 sec)
/* here we fetch all the users who have made more than 60 posts in total */
That concludes this snippet. I’m going to monitor this page for keywords and then if there’s a specific use case that’s missing – I will in due time write a section for it. Other than that, referencing the official MySQL/PostgreSQL documentation is the way to go.