Blogi 25.10.2019

Modelling database relations and constraints for “monogamous” relationships

Kiva kun löysit tämän artikkelin! Se sisältää varmasti hyvää tietoa, mutta pidäthän mielessä, että se on kirjoitettu 5 vuotta sitten.

Recently in a project, I stumbled into an interesting puzzle and decided to share it with you along with my own solution. Part of the credit goes to Jarkko Hyöty for great feedback and ideas.
We needed a data model where exactly two persons could have a partnership with each other during some time period, while one person should never belong to more than one partnership simultaneously. Initially, the problem sounded quite simple, but it starts to get a little more tricky as you dive deeper into it – especially if you want to model those rules as database constraints. It might make sense to model this sort of a thing rather in a graph database, but we wanted to stick with the good old PostgreSQL which we were already using.

First steps towards the solution

So how might we start to model this in a relational database? My first idea for modelling this kind of one-to-two relationship was something like this:

CREATE TABLE person (
    id              SERIAL PRIMARY KEY, -- auto-incrementing integer
    name            TEXT NOT NULL
    -- other person details ...
);
CREATE TABLE partnership (
    id              UUID PRIMARY KEY DEFAULT uuid_generate_v1(),
    person1         INTEGER REFERENCES person(id) ON DELETE CASCADE,
    person2         INTEGER REFERENCES person(id) ON DELETE CASCADE,
    start_date      DATE NOT NULL,
    end_date        DATE NOT NULL CHECK (start_date <= end_date)
);

That does perfectly limit the number of partners in a partnership to exactly two. However, there is a troublesome distinction between person 1 and person 2. The roles in the partnership are not strictly equal. So maybe something like this instead?

CREATE TABLE partnership (
    id              UUID PRIMARY KEY DEFAULT uuid_generate_v1(),
    start_date      DATE NOT NULL,
    end_date        DATE NOT NULL CHECK (start_date <= end_date)
);
CREATE TABLE partner (
    id              UUID PRIMARY KEY DEFAULT uuid_generate_v1(),
    partnership_id  UUID REFERENCES partnership(id) ON DELETE CASCADE,
    person_id       INTEGER REFERENCES person(id) ON DELETE CASCADE
);

Limiting the maximum number of partners

Now we no longer distinguish between partners 1 and 2, so that is good, but since this is now a standard one-to-many relationship, it no longer covers the requirement of there being exactly two people in a partnership. That requirement can be divided into two parts: 1) a partnership must have at least two members, and 2) a partnership must have at most two members. The first one is actually quite difficult, so let’s return to it later. The second one, however, can be solved for example like this:

CREATE TABLE partnership (
    id              UUID PRIMARY KEY DEFAULT uuid_generate_v1(),
    start_date      DATE NOT NULL,
    end_date        DATE NOT NULL CHECK (start_date <= end_date)
);
CREATE TABLE partner (
    partnership_id  UUID REFERENCES partnership(id) ON DELETE CASCADE,
    ind             SMALLINT NOT NULL CHECK (ind BETWEEN 1 AND 2),
    person_id       INTEGER REFERENCES person(id) ON DELETE CASCADE,
    PRIMARY KEY (partnership_id, ind),
    UNIQUE (partnership_id, person_id)
);

Here we have removed the separate primary key id from the partner table and instead added a composite primary key of partnership_id and ind. Here ind (i.e. index) is a new integer column, where the value must equal either 1 or 2. Because it is used in the composite key, it cannot have the same value twice within a partnership. Therefore it enforces the constraint that there can be at most two people in a partnership. Finally, we also add a unique constraint for (partnership_id, person_id) to prevent one person from being in a partnership with him/herself, although we will soon be adding another constraint which effectively also does prevent this.

Preventing overlapping partnerships

What about the requirement of no overlapping partnerships then? At first, it sounded impossible to enforce this on the database constraint level, but after some research, I found out that the PostgreSQL exclude constraint combined with a date range and an overlap operator might be a potential fit for this. For example, in this presentation by Jeff Davis, those were used to guarantee that there are no overlapping reservations for the same room. However, applying it to our current data model does not seem to quite work out. This kind of constraint can only compare rows within one table, but we have person_id and dates in different tables. Could we somehow still make it work?
Let’s try changing our data model a bit more by moving the date columns from the partnership association table into the same partner table with person_id. At this point the partnership table has no other columns left except for the primary key id. Therefore it is actually redundant and can be left out if we just generate the shared partnership_id at the application level instead.

CREATE TABLE partner (
    partnership_id  UUID NOT NULL,
    ind             SMALLINT NOT NULL CHECK (ind BETWEEN 1 AND 2),
    person_id       INTEGER REFERENCES person(id) ON DELETE CASCADE,
    start_date      DATE NOT NULL,
    end_date        DATE NOT NULL CHECK (start_date <= end_date),
    PRIMARY KEY (partnership_id, ind),
    UNIQUE (partnership_id, person_id)
);

Since all the relevant data is now located at the same table, it becomes possible to add an exclusion constraint which prevents one person from having two simultaneous partnerships. The idea we want to express in the constraint is that if any two rows have the same person_id and their durations overlap, then that is a constraint violation. The syntax for that is this:

EXCLUDE USING gist (
    person_id WITH =,
    daterange(start_date, end_date, '[]') WITH &&
)

Note that this uses a gist index so you need to have the btree_gist extension enabled.

CREATE EXTENSION IF NOT EXISTS btree_gist;

As you may have noticed while changing the data model we simultaneously created a new problem though. Since the dates are no longer shared through the association table, it is now possible for the two members of a partnership to have different durations for it. This does not really make sense here. Fortunately, that can be remedied by adding a couple more exclusion constraints which will cause a constraint violation if two rows with equal partnership_id values have differing start_date or end_date values. Note that when updating dates of an existing partnership, this constraint will temporarily break until the second row has also been updated, so we must make these constraints deferred to only enforce them at transaction commit time.

EXCLUDE USING gist (partnership_id WITH =, start_date WITH <>) DEFERRABLE INITIALLY DEFERRED,
EXCLUDE USING gist (partnership_id WITH =, end_date WITH <>) DEFERRABLE INITIALLY DEFERRED

Limiting the minimum number of partners

Finally, let’s return to the constraint that a valid partnership should always have two persons in it – not just one. As far as I know, this can only be solved by a trigger function which checks the partnership member count on insert, update and delete, and raises an exception if the constraint is violated. Furthermore, it has to be a deferred trigger, so that the rule is only enforced at transaction commit time, instead of raising an error immediately when you try to insert the first person to the partnership before you even have had a chance to insert the second person.
This kind of triggers can cause a considerable amount of complexity and may be difficult to maintain and debug, so in the end, we decided that enforcing this constraint is best left to the application code since it is anyway trivial to do there.

Testing out the solution

So does our finished data model now make sense and is it usable? Let’s draft a couple of queries to validate that!
Creating a new partnership can be done by inserting two rows having same partnership_id, start_date and end_date with each other and different ind and .

-- Assuming we have two persons in the person table with ids 1 and 2:
-- INSERT INTO person (name) VALUES ('Donald'), ('Daisy');
INSERT INTO partner VALUES
    (
        '336a7c66-a43c-478d-a724-a65b377d77ee',     -- some generated partnership_id
        1,                                          -- first ind
        1,                                          -- first person_id
        '2018-01-01',                               -- start_date
        '2019-06-30'                                -- end_date
    ),
    (
        '336a7c66-a43c-478d-a724-a65b377d77ee',     -- same partnership_id
        2,                                          -- second ind
        2,                                          -- second person_id
        '2018-01-01',                               -- same start_date
        '2019-06-30'                                -- same end_date
    );

Fetching partners for a given person is slightly more complex, but not too bad either. We just need to fetch all rows where there exists another row with the same partnership_id and the desired person_id. So to get for example Donald’s partners we could write:

Fetching Donald’s partners
SELECT person_id, name, start_date, end_date
FROM partner p1                                         -- Select from partner table
LEFT OUTER JOIN person ON person.id = p1.person_id      -- (joined with the person table just to get the partner's name too)
WHERE EXISTS (                                          -- all the rows for which there exists
    SELECT 1 FROM partner p2                            -- another partner row
    WHERE p2.person_id = 1 AND                          -- where the person is Donald (id 1)
          p2.partnership_id = p1.partnership_id AND     -- which is part of the same partnership
          p2.ind <> p1.ind                                -- but is not the same row
);

Resulting in

person_id
name
start_date
end_date
2
Daisy
2018-01-01
2019-06-30

Updating a partnership duration can also be done with just one simple statement, even though the dates must be updated on both rows. Note also, that in order to keep the time comparisons simple we did not make end_date nullable, but we can use a postgreSQL special value ‘infinity’ if we want to express that a partnership currently has no known end date. That value can later be converted into null at the application level if desired.

UPDATE partner
SET start_date = '2018-02-10', end_date = 'infinity'
WHERE partnership_id = '336a7c66-a43c-478d-a724-a65b377d77ee';

All in all the solution seems to work fine, even though one can question if it would have made more sense to just check all the constraints in the application level to keep the data model simpler. Anyway, if you have alternative solutions for this little puzzle I’ll be interested in hearing them. Cheers!

English

Joosa Kurvinen

Joosa is an experienced fullstack software developer with a very broad skill set. He is always eager to learn and try out new things, regardless of whether that is with backend, frontend, devops or architecture. He has an agile mindset and always strives after clean and testable code. Joosa graduated from the University of Helsinki and wrote his master's thesis on AI and optimization algorithms.

Takaisin ylös