Skip to main content

PostgreSQL 14 Database Refresher :: Introduction Part II

Shenzhen, China

Foreign Keys

REFERENCES

As part of the previous step I created 4 tables:

CREATE TABLE users (
  user_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  username VARCHAR ( 25 ) UNIQUE NOT NULL,
  email VARCHAR ( 50 ) UNIQUE NOT NULL,
  full_name VARCHAR ( 100 ) NOT NULL,
  last_login TIMESTAMP,
  created_on TIMESTAMP NOT NULL
);

CREATE TABLE boards (
  board_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  board_name VARCHAR ( 50 ) UNIQUE NOT NULL,
  board_description TEXT NOT NULL
);

CREATE TABLE comments (
  comment_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  user_id INT REFERENCES users(user_id) ON DELETE CASCADE,
  board_id INT REFERENCES boards(board_id) ON DELETE CASCADE,
  comment TEXT NOT NULL,
  time TIMESTAMP
);

CREATE TABLE rich_content (
  content_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  comment_id INT REFERENCES comments(comment_id) ON DELETE CASCADE,
  content JSONB NOT NULL
);

The comment and rich_content table both contain references to other tables - those are called foreign keys. Importing keys from other tables, instead of duplicating them, makes it much harder to get your data out of sync!

ON DELETE

user_id INT REFERENCES users(user_id) ON DELETE CASCADE,
board_id INT REFERENCES boards(board_id) ON DELETE CASCADE,

You can also define a default behaviour what should be done if the key, that is imported into another table, is deleted. The default behaviour is to prevent the deletion ON DELETE NO ACTION as long there is still data somewhere else that relies on that key - e.g. you cannot delete a user before all his postings are deleted.

The ON DELETE CASCADE block means that every posting of this user is deleted as well if the user is deleted inside the users table.

  • ON DELETE NO ACTION: Deleting the user will throw an error if there is still related content.
  • ON DELETE CASCADE: Deleting the user deletes everything that is bound to his ID.
  • ON DELETE SET NULL: Will set the username to NULL while keeping the users other data intact.

Joins

INNER JOIN

The inner join uses the overlap of two tables to find additional, related data. Like an User ID might might have data assigned to it in several tables. You can use the INNER JOIN to match rows in those tables when they have the same value for user_id.

EXAMPLE: Get first 60 characters of all comments from a specific user board and project it as preview:

SELECT comment_id, user_id, LEFT(comment, 60) AS preview FROM comments WHERE board_id='39';

 comment_id | user_id |                           preview                            
------------+---------+--------------------------------------------------------------
         63 |     858 | Maecenas tristique, est et tempus semper, est quam pharetra 
        358 |     876 | Mauris enim leo, rhoncus sed, vestibulum sit amet, cursus id
        410 |     344 | Praesent blandit. Nam nulla. Integer pede justo, lacinia ege
        429 |     789 | Maecenas ut massa quis augue luctus tincidunt. Nulla mollis 
        463 |     925 | Phasellus sit amet erat. Nulla tempus. Vivamus in felis eu s
        485 |     112 | Maecenas tristique, est et tempus semper, est quam pharetra 
        524 |      41 | Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Pr
        532 |     502 | In hac habitasse platea dictumst. Morbi vestibulum, velit id
        540 |     588 | Nullam porttitor lacus at turpis. Donec posuere metus vitae 
        545 |     587 | Praesent id massa id nisl venenatis lacinia. Aenean sit amet
        551 |     620 | Morbi porttitor lorem id ligula. Suspendisse ornare consequa
        972 |     998 | Aenean lectus. Pellentesque eget nunc. Donec quis orci eget 

The comments table does not contain the username. We can use a JOIN command to connect the user_id from the comments table with the user_id in the users table and read out the username where both match comments.user_id = users.user_id:

SELECT
  comments.comment_id, comments.user_id, users.username, time, LEFT(comment, 20) AS preview
  FROM
    comments
  INNER JOIN
    users
  ON
    comments.user_id = users.user_id
  WHERE
    board_id=39;


 comment_id | user_id |   username   |        time         |       preview        
------------+---------+--------------+---------------------+----------------------
        524 |      41 | klicciardo14 | 2018-07-09 15:36:28 | Lorem ipsum dolor si
        485 |     112 | erobrose33   | 2019-08-30 02:32:38 | Maecenas tristique, 
        410 |     344 | tposse9j     | 2018-10-07 19:02:52 | Praesent blandit. Na
        532 |     502 | dbenndx      | 2019-07-06 04:08:15 | In hac habitasse pla
        545 |     587 | rspitaroga   | 2019-11-03 22:48:08 | Praesent id massa id
        540 |     588 | rrandlegb    | 2020-01-21 19:52:11 | Nullam porttitor lac
        551 |     620 | bcarlawh7    | 2020-06-11 00:37:22 | Morbi porttitor lore
        429 |     789 | mpelllw      | 2019-05-24 14:56:49 | Maecenas ut massa qu
         63 |     858 | uvickarnt    | 2020-07-31 04:46:40 | Maecenas tristique, 
        358 |     876 | egeffenob    | 2020-06-20 02:28:58 | Mauris enim leo, rho
        463 |     925 | hderrickpo   | 2018-01-02 14:48:49 | Phasellus sit amet e
        972 |     998 | imcdowallrp  | 2018-03-15 00:23:38 | Aenean lectus. Pelle
(12 rows)

NATURAL INNER JOIN

The query can be simplified when the field has the identical name in both tables user.id by using NATURAL INNER JOIN instead:

SELECT
  comments.comment_id, comments.user_id, users.username, time, LEFT(comment, 20) AS preview
  FROM
    comments
  NATURAL INNER JOIN
    users
  WHERE
    board_id=39;


 comment_id | user_id |   username   |        time         |       preview        
------------+---------+--------------+---------------------+----------------------
        524 |      41 | klicciardo14 | 2018-07-09 15:36:28 | Lorem ipsum dolor si
        485 |     112 | erobrose33   | 2019-08-30 02:32:38 | Maecenas tristique, 
        410 |     344 | tposse9j     | 2018-10-07 19:02:52 | Praesent blandit. Na
        532 |     502 | dbenndx      | 2019-07-06 04:08:15 | In hac habitasse pla
        545 |     587 | rspitaroga   | 2019-11-03 22:48:08 | Praesent id massa id
        540 |     588 | rrandlegb    | 2020-01-21 19:52:11 | Nullam porttitor lac
        551 |     620 | bcarlawh7    | 2020-06-11 00:37:22 | Morbi porttitor lore
        429 |     789 | mpelllw      | 2019-05-24 14:56:49 | Maecenas ut massa qu
         63 |     858 | uvickarnt    | 2020-07-31 04:46:40 | Maecenas tristique, 
        358 |     876 | egeffenob    | 2020-06-20 02:28:58 | Mauris enim leo, rho
        463 |     925 | hderrickpo   | 2018-01-02 14:48:49 | Phasellus sit amet e
        972 |     998 | imcdowallrp  | 2018-03-15 00:23:38 | Aenean lectus. Pelle
(12 rows)

Subqueries

SQL allows you cascade several queries into each other. This is, basically, the long way to what JOIN is the shortcut to.

SELECT
  comment_id, user_id, LEFT(comment, 20)
  FROM
    comments
  WHERE
    user_id = (SELECT user_id from users WHERE username = 'klicciardo14');


     comment_id | user_id |         left         
------------+---------+----------------------
        524 |      41 | Lorem ipsum dolor si
(1 row)

Grouping Results

GROUP BY

Select all the boards on your forum and find the one that has the most comments by aggregating the amounts of comments on each board.

SELECT
  boards.board_name, COUNT(comment_id) AS comment_count
FROM
  comments
NATURAL INNER JOIN
  boards
GROUP BY
  boards.board_name
ORDER BY
  comment_count DESC
LIMIT 10;


      board_name      | comment_count 
----------------------+---------------
 Cloned               |            18
 budgetary management |            18
 open system          |            16
 Universal            |            16
 analyzer             |            15
 puppies              |            15
 Balanced             |            14
 leverage             |            14
 Seamless             |            14
 Innovative           |            13
(10 rows)

Count comment_id in the comments table and group those counts by the name of the board in which they were posted - which can be found by an inner join from the boards table.

Indexes

Get all the post on a specific board and order them by the time that they where published:

SELECT comment_id, user_id, time, LEFT(comment,20) FROM comments WHERE board_id=39 ORDER BY time DESC;

 comment_id | user_id |        time         |         left         
------------+---------+---------------------+----------------------
         63 |     858 | 2020-07-31 04:46:40 | Maecenas tristique, 
        358 |     876 | 2020-06-20 02:28:58 | Mauris enim leo, rho
        551 |     620 | 2020-06-11 00:37:22 | Morbi porttitor lore
        540 |     588 | 2020-01-21 19:52:11 | Nullam porttitor lac
        545 |     587 | 2019-11-03 22:48:08 | Praesent id massa id
        485 |     112 | 2019-08-30 02:32:38 | Maecenas tristique, 
        532 |     502 | 2019-07-06 04:08:15 | In hac habitasse pla
        429 |     789 | 2019-05-24 14:56:49 | Maecenas ut massa qu
        410 |     344 | 2018-10-07 19:02:52 | Praesent blandit. Na
        524 |      41 | 2018-07-09 15:36:28 | Lorem ipsum dolor si
        972 |     998 | 2018-03-15 00:23:38 | Aenean lectus. Pelle
        463 |     925 | 2018-01-02 14:48:49 | Phasellus sit amet e

To check the cost (how much time/cpu power will it take to run this query) check:

EXPLAIN SELECT comment_id, user_id, time, LEFT(comment,20) FROM comments WHERE board_id=39 ORDER BY time DESC
;

                           QUERY PLAN                            
-----------------------------------------------------------------
 Sort  (cost=65.75..65.78 rows=12 width=48)
   Sort Key: "time" DESC
   ->  Seq Scan on comments  (cost=0.00..65.53 rows=12 width=48)
         Filter: (board_id = 39)
(4 rows)

The sequential scan of every single comment to see if it fits the query takes up almost all the time. You can index the table to get this demand down. In the example above it makes sense to index each comment based on it's board_id - allowing us to pre-exclude every comment that was posted on a different board.

CREATE INDEX ON comments (board_id);

Re-checking the cost shows that the cost for our query was almost cut in half by now using the Bitmap Heap Scan:

EXPLAIN SELECT comment_id, user_id, time, LEFT(comment,20) FROM comments WHERE board_id=39 ORDER BY time DESC
;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Sort  (cost=33.73..33.76 rows=12 width=48)
   Sort Key: "time" DESC
   ->  Bitmap Heap Scan on comments  (cost=4.37..33.51 rows=12 width=48)
         Recheck Cond: (board_id = 39)
         ->  Bitmap Index Scan on comments_board_id_idx  (cost=0.00..4.37 rows=12 width=0)
               Index Cond: (board_id = 39)
(6 rows)