Why is there a pyramid image on a SQL article ?

Haha, good question ! This article is about how to organize a SQL table to represent a tree as a pyramid shape. This technique was explained to me briefly by very skilled SQL programmer. And I must say that I did not understand the interest at first. Only by putting the explanations on paper (and by doing some drawings) I understood all the interest of this clever method.

I am not an SQL specialist, this technique is probably well known to SQL developers, perhaps with another name. Since I do not find good literature on this subject, I write mine. So forgive my ignorance for not using the words / terms established by the (SQL) profession.

Update 2018-06-01 : Finally I found some very good article on the subject (in French), the technique seems to have the french name “représentation intervallaire”, there is very good article by SQLpro and I found another on zestedesavoir. Maybe their explanations are better … but my drawings are prettier ☺ . The technique seems to be known as “Nested set model”

First : A tree in SQL

It’s easy in SQL to store elements ordered as a tree : just having a column parent_index pointing to a parent aka auto-reference.

The SQL:1999 standard added recursive capability to SQL. So it’s possible to browse a tree using keyword WITH RECURSIVE in a request. The DBMS will do the recursive tree browsing.

And for sure it open up new SQL usage

Why I avoid it as possible

I feel not confortable using SQL to do recursive tree browsing. For 2 reasons :

  1. First I need time to understand each query using WITH RECURSIVE: I always need to take a pen and a paper … OK I know people who have no problem with that, so this is not a good reason (but still)
  2. SQL is a fantastic tool for managing multiple sets and applying all kinds of merge / group, filter and order operations without writing iterative code. Doing an iterative task with SQL should be an anecdotal use because SQL is not a tool designed for this (IMO).

SQL is good at handling structured data and their relationships with some sort of magic: avoiding (or hiding from the programmer) most of the iterative processes. WITH RECURSIVE breaks this and exposes an iterative process.

What SQL databases are really good at :

SQL set join/exclude

The “pyramidal stack”

What I call a “pyramid stack” is a way of representing a tree that does not use recursive query at all thanks to SQL and the organization of some tables / columns.

Draw me a “pyramidal stack”

To explain what a “pyramid stack” is, the best is some drawing. I will do 3 drawing representing same data, the last is a “pyramidal stack” Imagine a tree to store earth species and subspecies (for the demo we only use a few species) :

First a classical tree representation :

Classic tree

This representation is what programmers are used to seeing when it comes to trees.

With a “SQL vision” (vision of group/set), this becomes :

Group tree

This drawing represents the same configuration as before, but now the sets of groups appear more clearly.

And finaly with a “pyramidal stack vision”, this becomes :

Stack tree

You see the pyramids ☺

… and you may wonder where that leads. Keep in mind this picture especialy the number (1 → 14 and 0 → 2), and read next part.

Populate table test

So let’s build this table, for a “normal” tree you will simply use an auto-reference to the table primary key (here group_parent_id) :

create table if not exists "group"
(
  "group_id" serial primary key,
  "group_name" text not null default '',
  "group_parent_id" integer null references "group"("group_id") on delete cascade -- Used to build a classic tree
);

For “pyramidal stack” we will add 3 new column : RIGHT_BOUND, LEFT_BOUND and LEVEL :

ALTER TABLE "group"
ADD "group_left_bound" integer not null default 1;
ALTER TABLE "group"
ADD "group_right_bound" integer not null default 2;
ALTER TABLE "group"
ADD "group_level" integer not null default 0;

As you’ve probably guessed, the group_left_bound / group_right_bound columns will store the numbers on the right of the previous illustration. And group_level will store the (0, 1 and 2 values)

Let’s insert some data to match the preceding illustration..

The resulting table now contains :

name level left_bound right_bound
Living on earth 0 1 14
Feline 1 2 9
Lion 2 3 4
Tiger 2 5 6
Cat 2 7 8
Reptile 1 10 13
Crocodile 2 11 12

Maybe with a reminder of the preceding picture it is clearer :

Stack tree

What we can do thanks to this organization

Select all children of 1 element

Query to get all feline species and subspecies :

SELECT "group_name", "group_id"
FROM  "group"
WHERE "group_left_bound" > 2 --because Feline left_bound=2
and "group_right_bound" < 9  --because Feline right_bound=9
ORDER BY "group_left_bound";

Will return all feline group : Lion, Tiger, Cat

And voilà : no recursive task; just using SQL at what it is good at : filtering data.

To illustrate the importance of group_level column, a query to select only direct descendant of the root element (“Living on Earth”) :

SELECT "group_name", "group_id"
FROM  "group"
WHERE "group_left_bound" > 1    --because Earth left_bound=1
--and "group_right_bound" < 14  --because Earth is the root, you can ommit this
AND "group_level" = 1           --because Earth group_level+1 = 1
ORDER BY "group_left_bound";

Will return just feline and reptile group.

Display the tree

Thanks to group_level column you can represent the tree with a very simple SQL query :

This request is very handy; I will refer to it later as Display_query

SELECT lpad("group_name", length("group_name")+"group_level", '.')
FROM  "group"
WHERE "group_left_bound" >= 1
ORDER BY "group_left_bound";

Should display a tree :

Living on earth
.Feline
..Lion
..Tiger
..Cat
.Reptile
..Crocodile

Select all parent of 1 element

To get all Tiger parents :

SELECT "group_name", "group_id"
FROM  "group"
WHERE "group_left_bound" < 5 -- because tiger left_bound=5
and "group_right_bound" > 6  -- because tiger right_bound=6
--and "group_level" != 0     -- uncomment this line to ommit the root element
ORDER BY "group_left_bound";

Result :

group_name group_id
Living on earth 1
Feline 2

Select all final element

To get all final element (element that don’t have any children)

SELECT "group_name", "group_id"
FROM  "group"
WHERE ("group_right_bound" - "group_left_bound" = 1)

Result :

group_name group_id
Lion 3
Tiger 4
Cat 5
Crocodile 7

Pros and cons

Pros

  • Requests are incredibly simplier
  • With a deep tree requests will be incredibly faster too
  • It’s compatible with the classical tree (see we have column “group_parent_id” present, just updating it allow to use the classic recursive tree browsing)

Cons

  • Adding/deleting element is more complex (see next part for some functions to facilitate this)
  • If we have intensive edition (add/delete) on the table the classical tree will be faster (because there is no need to update other elements right_bound and left_bound than the inserted or deleted one)

Functions to Add, delete and move elements

As you might expect adding, deleting and moving group in our database is way harder than it is with a recursive tree because of all this RIGHT_BOUND and LEFT_BOUND must be updated at each change.

In this part I will use Postgresql (my favorite DBMS) PL function to automate the RIGHT_BOUND and LEFT_BOUND update.

Adding a group

Adding a group at the end of all other is not really complex.
But adding in the middle of an existing group need to move all group with left_bound below the new parent right_bound, and moving the parent right bound too.

-- Add a group to another group
-- insert data
--
-- IMPORTANT: Should be called in a start transaction SERIALIZABLE, as it need to be atomic on the whole table "group"
-- don't use other level, just SERIALIZABLE
-- example : add group 'toto' to group id 4 :
--   start transaction ISOLATION LEVEL SERIALIZABLE;
--     select group_add_group(4, 'toto');
--   commit;
--
-- p_parent_id :        parent id to add group
-- p_name :             new group name
-- return :
--    id of group created if all is okay (this is > 0),
--    -1 if no group created (because parent does not exist NOTE: the root element must exist)
create or replace function "group_add_group"(p_parent_id integer, p_name text)
returns integer
as $$
declare
  new_group_id integer;
  tmp_parent "group";
  new_level integer;
  new_left_bound integer;
  new_right_bound integer;
begin
  new_group_id = -1;

  -- Check parent exist
  if (0 != (select count ("group_id") from "group" where ("group_id" = p_parent_id)) ) then

  -- Get parent
  select * into tmp_parent from "group" where "group_id" = p_parent_id;

  -- Compute current group left/right to insert
  new_level = tmp_parent.group_level + 1;
  new_left_bound = tmp_parent.group_right_bound;
  new_right_bound = tmp_parent.group_right_bound + 1;

  -- Update all group right/left
  update "group" set ("group_right_bound") = (group_right_bound + 2)
    where "group_right_bound" >= new_left_bound;
  update "group" set ("group_left_bound") = (group_left_bound + 2)
    where "group_left_bound" >= new_left_bound;

  -- Insert new group, and get group_id inserted
  insert into "group"("group_name", "group_left_bound", "group_right_bound", "group_parent_id", "group_level")
    values (p_name, new_left_bound, new_right_bound, p_parent_id, new_level)
    returning group_id into new_group_id; -- → Get id inserted

  end if;

  -- return group created (-1 for no group)
  return new_group_id;
end;
$$
language plpgsql;
alter function "group_add_group"(integer, text) owner to "pyramid_demo";

Use this function to populate our table. As this function perform multiple requests in production we must call it in a transaction (multiple call can be grouped in 1 transaction) :

-- Living on earth
insert into "group"("group_name")
values('Living on earth');

start transaction ISOLATION LEVEL SERIALIZABLE;
  -- Feline
  select group_add_group((select "group_id" from "group" where "group_name"='Living on earth' limit 1), 'Feline');
  select group_add_group((select "group_id" from "group" where "group_name"='Feline' limit 1), 'Lion');
  select group_add_group((select "group_id" from "group" where "group_name"='Feline' limit 1), 'Tiger');
  select group_add_group((select "group_id" from "group" where "group_name"='Feline' limit 1), 'Cat');

  -- Reptile
  select group_add_group((select "group_id" from "group" where "group_name"='Living on earth' limit 1), 'Reptile');
  select group_add_group((select "group_id" from "group" where "group_name"='Reptile' limit 1), 'Crocodile');
commit;

Play with it : add a group “leopard” and add to it “African leopard”, “Indian leopard”, “javan leopard”, “Arabian leopard” and “Amur leopard” (yes, thanks to wikipedia I know many leopard).

start transaction ISOLATION LEVEL SERIALIZABLE;
  -- Leopard + many leopard subspecies
  select group_add_group((select "group_id" from "group" where "group_name"='Feline' limit 1), 'Leopard');
  select group_add_group((select "group_id" from "group" where "group_name"='Leopard' limit 1), 'African leopard');
  select group_add_group((select "group_id" from "group" where "group_name"='Leopard' limit 1), 'Indian leopard');
  select group_add_group((select "group_id" from "group" where "group_name"='Leopard' limit 1), 'Arabian leopard');
  select group_add_group((select "group_id" from "group" where "group_name"='Leopard' limit 1), 'Amur leopard');
commit;

After adding all leopards, the Display_query (seen before) show :

Living on earth
.Feline
..Lion
..Tiger
..Cat
..Leopard
...African leopard
...Indian leopard
...Arabian leopard
...Amur leopard
.Reptile
..Crocodile

Deleting group

Deleting need also to update all group with left_bound or right_bound inferior to deleted left_bound.

Note: this function will refuse to delete a group with children, this restriction can be removed but I find it dangerous … and I’m a bit lazy so you must before delete all group’s child to delete a group.

-- Remove a group if the group has no child
-- delete data
--
-- IMPORTANT: Should be called in a start transaction SERIALIZABLE, as it need to be atomic on the whole table "group"
-- don't use other level, just SERIALIZABLE
-- example : remove group id 4 :
--   start transaction ISOLATION LEVEL SERIALIZABLE;
--     select group_remove_group(4);
--   commit;
--
-- p_id :           group id to remove
-- return :
--  id if group delete ok (this is > 0),
--  -1 if no delete peformed because element has child,
--  -2 if element does not exist,
--  -42 if element is root
create or replace function "group_remove_group"(p_id integer)
returns integer
as $$
declare
  tmp_current "group";
  return_id integer;
begin
  return_id = -1;

  -- Check group to delete exist
  if (0 = (select count ("group_id") from "group" where ("group_id" = p_id)) ) then
    return -2;
  end if;

  -- Check element has parent (if he don't have it's a root element = delete nothing)
  select * into tmp_current from "group" where "group_id" = p_id;
  if (tmp_current.group_parent_id IS NULL) then
    return -42;
  end if;

  -- Check element has child (if he has child we remove nothing)
  if (0 = (select count ("group_id") from "group" where ("group_parent_id" = p_id)) ) then

    -- Set return id
    return_id = p_id;

    -- Delete element
    delete from "group" where ("group_id" = p_id);

    -- Update all group right/left
    update "group" set ("group_right_bound") = (group_right_bound - 2)
      where "group_right_bound" >= tmp_current.group_right_bound;
    update "group" set ("group_left_bound") = (group_left_bound - 2)
      where "group_left_bound" >= tmp_current.group_left_bound;

  end if;

  -- return group deleted (-1 for no group deleted)
  return return_id;
end;
$$
language plpgsql;
alter function "group_remove_group"(integer) owner to "pyramid_demo";

So let’s delete Lion and all Reptile (don’t be sad, we will re-insert them later)

start transaction ISOLATION LEVEL SERIALIZABLE;
  -- Delete Lion and Reptile
  select group_remove_group((select "group_id" from "group" where "group_name"='Lion' limit 1));
  select group_remove_group((select "group_id" from "group" where "group_name"='Crocodile' limit 1));
  select group_remove_group((select "group_id" from "group" where "group_name"='Reptile' limit 1));
commit;

After deleting Lion and Reptile, the Display_query (seen before) show :

Living on earth
.Feline
..Tiger
..Cat
..Leopard
...African leopard
...Indian leopard
...Arabian leopard
...Amur leopard

If you feel sad losing Lions and Crocodiles :

start transaction ISOLATION LEVEL SERIALIZABLE;
  -- Revive Lion !
  select group_add_group((select "group_id" from "group" where "group_name"='Feline' limit 1), 'Lion');

  -- Revive all Reptile
  select group_add_group((select "group_id" from "group" where "group_name"='Living on earth' limit 1), 'Reptile');
  select group_add_group((select "group_id" from "group" where "group_name"='Reptile' limit 1), 'Crocodile');
commit;

After re-inserting Lion and reptile you will see that the displayed order changed a bit but the underlying tree is the same :

Living on earth
.Reptile
..Crocodile
.Feline
..Tiger
..Cat
..Leopard
...African leopard
...Indian leopard
...Arabian leopard
...Amur leopard
..Lion

Moving group

Moving a group is by far the hardest part. I move the group in a temporary area (in right_bound max+1), then perform all change to have place to insert group to move and finaly do the move :

-- Move a group
-- update data
--
-- IMPORTANT: Should be called in a start transaction SERIALIZABLE, as it need to be atomic on the whole table "group"
-- don't use other level, just SERIALIZABLE
-- example : remove group id 4 :
--   start transaction ISOLATION LEVEL SERIALIZABLE;
--     select group_move_group(4);
--   commit;
--
-- p_id :            group id to move
-- p_new_parent_id : group id to set as new parent for this group
-- return :
--    id of group moved if group move ok (this is > 0),
--    -1 if no move peformed because parent don't exist,
--    -2 if group to move don't exist,
--    -3 if try to move a group to one of his child
--    -4 if group to move has already this group as father (nothing done)
create or replace function "group_move_group"(p_id integer, p_new_parent_id integer)
returns integer
as $$
declare
  tmp_current "group";
  tmp_parent "group";
  current_size integer;
  biggest_r integer;
  current_l integer;
  current_r integer;
  move_offset integer;
  move_level integer;
begin

  -- Check new parent exist
  if (0 = (select count ("group_id") from "group" where ("group_id" = p_new_parent_id)) ) then
    return -1;
  end if;

  -- Check group to move exist
  if (0 = (select count ("group_id") from "group" where ("group_id" = p_id)) ) then
    return -2;
  end if;

  -- Check we don't try to move group to one of his own child
  select * into tmp_current from "group" where "group_id" = p_id;
  select * into tmp_parent from "group" where "group_id" = p_new_parent_id;
  if (tmp_current.group_left_bound < tmp_parent.group_left_bound) then
    if (tmp_current.group_right_bound > tmp_parent.group_right_bound) then
      return -3;
   end if;
  end if;

  -- Check if this is already his parent ... No problem, but nothing to do
  if (tmp_current.group_parent_id = p_new_parent_id) then
    return -4;
  end if;

  -- Get biggest right
  biggest_r = (select "group_right_bound" from "group" order by "group_right_bound" desc limit 1);

  -- Get current group to move size
  current_size = tmp_current.group_right_bound - tmp_current.group_left_bound;

  -- Before moving: memorize current group left and right bound
  current_l = tmp_current.group_left_bound;
  current_r = tmp_current.group_right_bound;
  move_level = tmp_parent.group_level + 1 - tmp_current.group_level;
  move_offset =  biggest_r + 1 - current_l;

  -- Move group to a temporary area (at max+1)
  update "group" set ("group_right_bound", "group_left_bound") = (group_right_bound + move_offset, group_left_bound + move_offset)
    where "group_right_bound" <= current_r
    and "group_left_bound" >= current_l;

  -- Update all below current left and then all bellow parent
  -- move all below current
  update "group" set ("group_left_bound") = (group_left_bound - current_size - 1)
    where "group_left_bound" > current_l
    and "group_left_bound" <= biggest_r;
  update "group" set ("group_right_bound") = (group_right_bound - current_size - 1)
    where "group_right_bound" > current_l
    and "group_right_bound" <= biggest_r;
  -- After move : get new parent position (we can compute it but it's safer to get it)
  select * into tmp_parent from "group" where "group_id" = p_new_parent_id;

  -- move all below parent right (including parent right)
  update "group" set ("group_left_bound") = (group_left_bound + current_size + 1)
    where "group_left_bound" > tmp_parent.group_right_bound
    and "group_left_bound" <= biggest_r;
  update "group" set ("group_right_bound") = (group_right_bound + current_size + 1)
    where "group_right_bound" >= tmp_parent.group_right_bound -- Here we include group right bound
    and "group_right_bound" <= biggest_r;
  -- After move : get new parent position (we can compute it but it's safer/easier to get it)
  select * into tmp_parent from "group" where "group_id" = p_new_parent_id;

  -- Compute move group (that is now in max+1 position) to it's new place
  move_offset = biggest_r + 1 - tmp_parent.group_right_bound + current_size + 1;
  --RAISE NOTICE 'MOVE : %+1 - % - %+1 = %', biggest_r, tmp_parent.group_right_bound, current_size, move_offset;--DEBUG

  -- Set level for group to move
  update "group" set ("group_level") = (group_level + move_level)
    where  "group_right_bound" > biggest_r; -- update level only for group to move

  -- Execute move group
  update "group" set ("group_left_bound", "group_right_bound") = (group_left_bound - move_offset , group_right_bound - move_offset)
    where "group_left_bound" > biggest_r;

  -- Set parent_id for group to move
  update "group" set ("group_parent_id") = (p_new_parent_id)
    where "group_id" = p_id;

  -- Return group id moved
  return p_id;
end;
$$
language plpgsql;
alter function "group_move_group"(integer, integer) owner to "pyramid_demo";

That’s a bit of code, so let’s test it : if we have “Reptile” group we should have “Mammal” group too.

Let’s add a “Mammal” group and move “Feline” inside (this move will also move all Feline child).

start transaction ISOLATION LEVEL SERIALIZABLE;
  select group_add_group((select "group_id" from "group" where "group_name"='Living on earth' limit 1), 'Mammal'); -- Create Mammal group
  select group_move_group((select "group_id" from "group" where "group_name"='Feline' limit 1), (select "group_id" from "group" where "group_name"='Mammal' limit 1)); -- Move Feline inside Mammal
commit;

And now Display_query result is :

Living on earth
.Reptile
..Crocodile
.Mammal
..Feline
...Tiger
...Cat
...Leopard
....African leopard
....Indian leopard
....Arabian leopard
....Amur leopard
...Lion

Bonus check a group

Here a simple function to check left_bound and right_bound of one group (check that it don’t mess with other group). You shouldn’t need it, but during developpement I used it :

-- Chek a group
-- check group left/right and level
--
-- IMPORTANT: Should be called in a start transaction SERIALIZABLE, as it need to be atomic on the whole table "group"
-- don't use other level, just SERIALIZABLE
-- example : check group id 4 :
--  start transaction ISOLATION LEVEL SERIALIZABLE;
--    select group_check_group(4);
--  commit;
--
-- p_id :         group id to check
-- return :
--    id if group check ok (this is > 0),
--    -1 if group check not ok
--    -2 if element does not exist,
--    -3 if size of group is not ok
--    -4 if right bound or parent_id of group is not ok
--    -5 if left bound or parent_id of group is not ok
--    -6 if level or parent_id of group is not ok
--    -42 if element is root
create or replace function "group_check_group"(p_id integer)
returns integer
as $$
declare
  tmp_current "group";
  current_size integer;
  biggest_r integer;
  biggest_l integer;
  return_id integer;
begin
  return_id = -1;

  -- Check group to check exist
  if (0 = (select count ("group_id") from "group" where ("group_id" = p_id)) ) then
    return -2;
  end if;

  -- Check element has parent (if he don't have it is the root element = check nothing)
  --TODO: check that all groups are in right/left bound of root element
  select * into tmp_current from "group" where "group_id" = p_id;
  if (tmp_current.group_parent_id IS NULL) then
    return -42;
  end if;

  -- Get current size group
  current_size = tmp_current.group_right_bound - tmp_current.group_left_bound;

  -- Get biggest right
  biggest_r = (select "group_right_bound" from "group" where ("group_parent_id" = p_id) order by "group_right_bound" desc limit 1);

  -- Get biggest left
  biggest_l = (select "group_left_bound" from "group" where ("group_parent_id" = p_id) order by "group_left_bound" limit 1);

  -- Check group level, left and right bound according to the parent group
  if ( (select "group_level" from "group" where "group_id" = tmp_current.group_parent_id) + 1 = tmp_current.group_level) then
    if ( (select "group_left_bound" from "group" where "group_id" = tmp_current.group_parent_id) < tmp_current.group_left_bound) then
      if ( (select "group_right_bound" from "group" where "group_id" = tmp_current.group_parent_id) > tmp_current.group_right_bound) then
        if (current_size = 1) then  --no child for this group
          -- Set return id
          return_id = p_id;
        elsif (current_size = biggest_r - biggest_l + 2) then
          -- Set return id
          return_id = p_id;
        else
          return -3;
        end if;
      else
        return -4;
      end if;
    else
      return -5;
    end if;
  else
    return -6;
  end if;

  -- return group created (-1 for no group check)
  return return_id;
end;
$$
language plpgsql;
alter function "group_check_group"(integer) owner to "pyramid_demo";

Is it possible to do better using float for left/right bound ?

Perhaps you are wondering why not use floating numbers instead integers : doing so it is theorically possible to update only right_bound and left_bound of the element moved, added or deleted.

Because in this case you open the door to problems of representation (and precision) of floating. And this kind of problem is really hard to solve.

If you need only tree data, and plan to have very big tree the best maybe to use a graph database. But this is an area that I have not explored yet so I can not talk about it yet.

Conclusion

With some functions we can do simple SQL tree without any recursive code !

“Pyramidal stack” “Nested set” is very handy and will comme with a huge perfomance boost over recursive tree and is preferable for :

  • a deep tree (more than 3 leafs of depth)
  • if the tree is very frequently read but not frequently written