PostgreSQL 14 Database Refresher :: Introduction Part II
Foreign Keys
REFERENCES
As part of the previous step I created 4 tables:
CREATE TABLE users (
user_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
username VARCHAR ( 25 ) UNIQUE NOT NULL,
email VARCHAR ( 50 ) UNIQUE NOT NULL,
full_name VARCHAR ( 100 ) NOT NULL,
last_login TIMESTAMP,
created_on TIMESTAMP NOT NULL
);
CREATE TABLE boards (
board_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
board_name VARCHAR ( 50 ) UNIQUE NOT NULL,
board_description TEXT NOT NULL
);
CREATE TABLE comments (
comment_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
user_id INT REFERENCES users(user_id) ON DELETE CASCADE,
board_id INT REFERENCES boards(board_id) ON DELETE CASCADE,
comment TEXT NOT NULL,
time TIMESTAMP
);
CREATE TABLE rich_content (
content_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
comment_id INT REFERENCES comments(comment_id) ON DELETE CASCADE,
content JSONB NOT NULL
);
The comment
and rich_content
table both contain references to other tables - those are called foreign keys. Importing keys from other tables, instead of duplicating them, makes it much harder to get your data out of sync!
ON DELETE
user_id INT REFERENCES users(user_id) ON DELETE CASCADE,
board_id INT REFERENCES boards(board_id) ON DELETE CASCADE,
You can also define a default behaviour what should be done if the key, that is imported into another table, is deleted. The default behaviour is to prevent the deletion ON DELETE NO ACTION
as long there is still data somewhere else that relies on that key - e.g. you cannot delete a user before all his postings are deleted.
The ON DELETE CASCADE
block means that every posting of this user is deleted as well if the user is deleted inside the users
table.
ON DELETE NO ACTION
: Deleting the user will throw an error if there is still related content.ON DELETE CASCADE
: Deleting the user deletes everything that is bound to his ID.ON DELETE SET NULL
: Will set the username to NULL while keeping the users other data intact.
Joins
INNER JOIN
The inner join uses the overlap of two tables to find additional, related data. Like an User ID might might have data assigned to it in several tables. You can use the INNER JOIN to match rows in those tables when they have the same value for user_id
.
EXAMPLE: Get first 60 characters of all comments from a specific user board and project it as preview
:
SELECT comment_id, user_id, LEFT(comment, 60) AS preview FROM comments WHERE board_id='39';
comment_id | user_id | preview
------------+---------+--------------------------------------------------------------
63 | 858 | Maecenas tristique, est et tempus semper, est quam pharetra
358 | 876 | Mauris enim leo, rhoncus sed, vestibulum sit amet, cursus id
410 | 344 | Praesent blandit. Nam nulla. Integer pede justo, lacinia ege
429 | 789 | Maecenas ut massa quis augue luctus tincidunt. Nulla mollis
463 | 925 | Phasellus sit amet erat. Nulla tempus. Vivamus in felis eu s
485 | 112 | Maecenas tristique, est et tempus semper, est quam pharetra
524 | 41 | Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Pr
532 | 502 | In hac habitasse platea dictumst. Morbi vestibulum, velit id
540 | 588 | Nullam porttitor lacus at turpis. Donec posuere metus vitae
545 | 587 | Praesent id massa id nisl venenatis lacinia. Aenean sit amet
551 | 620 | Morbi porttitor lorem id ligula. Suspendisse ornare consequa
972 | 998 | Aenean lectus. Pellentesque eget nunc. Donec quis orci eget
The comments
table does not contain the username. We can use a JOIN
command to connect the user_id
from the comments
table with the user_id
in the users
table and read out the username where both match comments.user_id = users.user_id
:
SELECT
comments.comment_id, comments.user_id, users.username, time, LEFT(comment, 20) AS preview
FROM
comments
INNER JOIN
users
ON
comments.user_id = users.user_id
WHERE
board_id=39;
comment_id | user_id | username | time | preview
------------+---------+--------------+---------------------+----------------------
524 | 41 | klicciardo14 | 2018-07-09 15:36:28 | Lorem ipsum dolor si
485 | 112 | erobrose33 | 2019-08-30 02:32:38 | Maecenas tristique,
410 | 344 | tposse9j | 2018-10-07 19:02:52 | Praesent blandit. Na
532 | 502 | dbenndx | 2019-07-06 04:08:15 | In hac habitasse pla
545 | 587 | rspitaroga | 2019-11-03 22:48:08 | Praesent id massa id
540 | 588 | rrandlegb | 2020-01-21 19:52:11 | Nullam porttitor lac
551 | 620 | bcarlawh7 | 2020-06-11 00:37:22 | Morbi porttitor lore
429 | 789 | mpelllw | 2019-05-24 14:56:49 | Maecenas ut massa qu
63 | 858 | uvickarnt | 2020-07-31 04:46:40 | Maecenas tristique,
358 | 876 | egeffenob | 2020-06-20 02:28:58 | Mauris enim leo, rho
463 | 925 | hderrickpo | 2018-01-02 14:48:49 | Phasellus sit amet e
972 | 998 | imcdowallrp | 2018-03-15 00:23:38 | Aenean lectus. Pelle
(12 rows)
NATURAL INNER JOIN
The query can be simplified when the field has the identical name in both tables user.id
by using NATURAL INNER JOIN
instead:
SELECT
comments.comment_id, comments.user_id, users.username, time, LEFT(comment, 20) AS preview
FROM
comments
NATURAL INNER JOIN
users
WHERE
board_id=39;
comment_id | user_id | username | time | preview
------------+---------+--------------+---------------------+----------------------
524 | 41 | klicciardo14 | 2018-07-09 15:36:28 | Lorem ipsum dolor si
485 | 112 | erobrose33 | 2019-08-30 02:32:38 | Maecenas tristique,
410 | 344 | tposse9j | 2018-10-07 19:02:52 | Praesent blandit. Na
532 | 502 | dbenndx | 2019-07-06 04:08:15 | In hac habitasse pla
545 | 587 | rspitaroga | 2019-11-03 22:48:08 | Praesent id massa id
540 | 588 | rrandlegb | 2020-01-21 19:52:11 | Nullam porttitor lac
551 | 620 | bcarlawh7 | 2020-06-11 00:37:22 | Morbi porttitor lore
429 | 789 | mpelllw | 2019-05-24 14:56:49 | Maecenas ut massa qu
63 | 858 | uvickarnt | 2020-07-31 04:46:40 | Maecenas tristique,
358 | 876 | egeffenob | 2020-06-20 02:28:58 | Mauris enim leo, rho
463 | 925 | hderrickpo | 2018-01-02 14:48:49 | Phasellus sit amet e
972 | 998 | imcdowallrp | 2018-03-15 00:23:38 | Aenean lectus. Pelle
(12 rows)
Subqueries
SQL allows you cascade several queries into each other. This is, basically, the long way to what JOIN
is the shortcut to.
SELECT
comment_id, user_id, LEFT(comment, 20)
FROM
comments
WHERE
user_id = (SELECT user_id from users WHERE username = 'klicciardo14');
comment_id | user_id | left
------------+---------+----------------------
524 | 41 | Lorem ipsum dolor si
(1 row)
Grouping Results
GROUP BY
Select all the boards on your forum and find the one that has the most comments by aggregating the amounts of comments on each board.
SELECT
boards.board_name, COUNT(comment_id) AS comment_count
FROM
comments
NATURAL INNER JOIN
boards
GROUP BY
boards.board_name
ORDER BY
comment_count DESC
LIMIT 10;
board_name | comment_count
----------------------+---------------
Cloned | 18
budgetary management | 18
open system | 16
Universal | 16
analyzer | 15
puppies | 15
Balanced | 14
leverage | 14
Seamless | 14
Innovative | 13
(10 rows)
Count comment_id
in the comments
table and group those counts by the name of the board in which they were posted - which can be found by an inner join from the boards
table.
Indexes
Get all the post on a specific board and order them by the time that they where published:
SELECT comment_id, user_id, time, LEFT(comment,20) FROM comments WHERE board_id=39 ORDER BY time DESC;
comment_id | user_id | time | left
------------+---------+---------------------+----------------------
63 | 858 | 2020-07-31 04:46:40 | Maecenas tristique,
358 | 876 | 2020-06-20 02:28:58 | Mauris enim leo, rho
551 | 620 | 2020-06-11 00:37:22 | Morbi porttitor lore
540 | 588 | 2020-01-21 19:52:11 | Nullam porttitor lac
545 | 587 | 2019-11-03 22:48:08 | Praesent id massa id
485 | 112 | 2019-08-30 02:32:38 | Maecenas tristique,
532 | 502 | 2019-07-06 04:08:15 | In hac habitasse pla
429 | 789 | 2019-05-24 14:56:49 | Maecenas ut massa qu
410 | 344 | 2018-10-07 19:02:52 | Praesent blandit. Na
524 | 41 | 2018-07-09 15:36:28 | Lorem ipsum dolor si
972 | 998 | 2018-03-15 00:23:38 | Aenean lectus. Pelle
463 | 925 | 2018-01-02 14:48:49 | Phasellus sit amet e
To check the cost
(how much time/cpu power will it take to run this query) check:
EXPLAIN SELECT comment_id, user_id, time, LEFT(comment,20) FROM comments WHERE board_id=39 ORDER BY time DESC
;
QUERY PLAN
-----------------------------------------------------------------
Sort (cost=65.75..65.78 rows=12 width=48)
Sort Key: "time" DESC
-> Seq Scan on comments (cost=0.00..65.53 rows=12 width=48)
Filter: (board_id = 39)
(4 rows)
The sequential scan of every single comment to see if it fits the query takes up almost all the time. You can index the table to get this demand down. In the example above it makes sense to index each comment based on it's board_id
- allowing us to pre-exclude every comment that was posted on a different board.
CREATE INDEX ON comments (board_id);
Re-checking the cost shows that the cost for our query was almost cut in half by now using the Bitmap Heap Scan:
EXPLAIN SELECT comment_id, user_id, time, LEFT(comment,20) FROM comments WHERE board_id=39 ORDER BY time DESC
;
QUERY PLAN
-------------------------------------------------------------------------------------------
Sort (cost=33.73..33.76 rows=12 width=48)
Sort Key: "time" DESC
-> Bitmap Heap Scan on comments (cost=4.37..33.51 rows=12 width=48)
Recheck Cond: (board_id = 39)
-> Bitmap Index Scan on comments_board_id_idx (cost=0.00..4.37 rows=12 width=0)
Index Cond: (board_id = 39)
(6 rows)