Nesting Structures from Flat Indexes

I don't always use recursion; but when I do, I don't always use recursion.

Reading time: about 5 minutes (961 words).

SQL databases are still one of the best ways of storing relational data, but sometimes you hit a wall transferring your representation to or from a table based layout.

Say we have a table full of Foos, where some Foos may have other Foo parents. This can easily be represented in a table with some unique primary key id, a secondary key parent, which joins to the parent’s id, and whatever data Foo may hold. Retrieving this table into a rust data structure, we’d find ourselves with something like Vec<Foo>, where Foo is defined as

struct Foo {
    id: i32,
    parent: Option<i32>,
    data: String,
}

This isn’t really the best way to represent these data though, what you really want is more akin to

struct Bar {
    id: i32,
    data: String,
    children: Vec<Bar>,
}

This is a problem I came up against when writing the comment API for Oration. Each comment is inserted into the database without too much hassle. We know the parent id, since the frontend forwards us that information, and the unique primary key is simply auto-incremented. When a select statement returns all comment data for a particular thread, we get a vector of PrintedComments back.

// Pull data from DB for a given `path`. `conn` here is the connection to the backend database
let comments: Vec<PrintedComment> = PrintedComment::list(conn, path)?;

The complete PrintedComment struct looks like the following at the time of writing:

#[derive(Serialize, Queryable, Debug)]
/// Subset of the comments table which is to be sent to the frontend.
struct PrintedComment {
    /// Primary key.
    id: i32,
    /// Parent comment.
    parent: Option<i32>,
    /// Actual comment.
    text: String,
    /// Commenters author if given.
    author: Option<String>,
    /// Commenters email address if given.
    email: Option<String>,
    /// Commenters website if given.
    url: Option<String>,
    /// Commenters indentifier.
    hash: String,
    /// Timestamp of creation.
    created: NaiveDateTime,
    /// Number of likes a comment has recieved.
    likes: Option<i32>,
    /// Number of dislikes a comment has recieved.
    dislikes: Option<i32>,
}

Since we want to show comment threads to the user; having entries in a flat list, sorted effectively by date isn’t that helpful. We need a NestedComment layout:

#[derive(Serialize, Debug)]
/// Subset of the comments table which is to be nested and sent to the frontend.
pub struct NestedComment {
    /// Primary key.
    id: i32,
    /// Actual comment.
    text: String,
    /// Commenters author if given.
    author: Option<String>,
    /// Commenters indentifier.
    hash: String,
    /// Timestamp of creation.
    created: DateTime<Utc>,
    /// Comment children.
    children: Vec<NestedComment>,
    /// Total number of votes.
    votes: i32,
}

I’ll skip over some of the conversions here, but you can clearly see we compress the commenter’s contact details into author, and sum votes. Most importantly though, we need a way to implement the nesting.

We know that our comments will have at most one parent, and that parent will not be itself. In that sense, the tree structure we wish to build can be considered as a special case of a directed digraph, and the method to generate a set of NestedComments involves building such a graph of all PrintedComments; traversing it recursively, then returning sets of children at each level.

Bluss has written an excellent graph data structure library called petagraph from which we’ll build a petgraph::DiGraphMap—a directed graph that allows us to control the node ids. If a node has a parent, we ensure that exists in the graph and add an edge from the parent to the child. If it doesn’t have a parent, we know this is one of our top-level comments, so we stash it aside for later:

let mut graph = DiGraphMap::new();
let mut top_level_ids = Vec::new();

for comment in &comments {
    //For each comment, build a graph of parents and children
    graph.add_node(comment.id)

    //Generate edges if a relationship is found, stash as a root if not
    if let Some(parent_id) = comment.parent {
        graph.add_node(parent_id);
        graph.add_edge(parent_id, comment.id, ());
    } else {
        top_level_ids.push(comment.id);
    }
}

Next, we iterate over all of the top-level ids and collect our children:

//Run over all root comments, recursively filling their children as we go
let tree: Vec<_> = top_level_ids
    .into_iter()
    .map(|id| build_tree(&graph, id, &comments))
    .collect();

The build_tree function is the recursive core of the problem. The implementation first (again) calls the recursive map for all comments and collects direct children at this level into a vector. The ::new constructor on NestedComment is passed this populated list of children, building the complete struct for the given top-level id. If the child vector is empty, we know the current comment has no replies, and we can therefore pass an empty vector to the NestedComment constructor.

/// Construct a nested comment tree from the flat indexed data obtained from the database.
fn build_tree(graph: &DiGraphMap<i32, ()>, id: i32, comments: &[PrintedComment]) -> NestedComment {
    let children: Vec<NestedComment> = graph
        .neighbors(id)
        .map(|child_id| build_tree(graph, child_id, comments))
        .collect();

    //We can just unwrap here since the id value is always populated from a map over contents.
    let idx: usize = comments.iter().position(|c| c.id == id).unwrap();

    if !children.is_empty() {
        NestedComment::new(&comments[idx], children)
    } else {
        NestedComment::new(&comments[idx], Vec::new())
    }
}

At this point, our list is completely nested and ready to be pushed to the frontend without fear of your witless, snarky responses being incorrectly assigned to the wrong first! comment.

You can take a look at the complete implementation of NestedComment’s methods here. In the coming days I plan to write up some further caveats I uncovered when working with this data in Elm on the frontside of oration, so please at least attempt to contain your excitement until then.