Rust - Simple graphs algorithms

By Tomasz Kuczma

December 27, 2022

Graphs (and trees - their special case) are data structures used in modeling complex problems such as finding an optimal route, exploring possible moves in a game, caching engines, modeling relations and more . That’s why they are very often the fundament of software engineer interview tasks in big companies - FAANG. Let’s implement some basic graph algorithms in Rust.

Rust - single ownership and self reference

Rust is a great programming language that achieves memory safety by forcing a single ownership model and borrow-checker mechanism. That is actually problematic for data structures that require referencing Self type e.g. graphs, trees, and double-linked lists. So it is impossible in Rust to express this:

pub struct Node {
    id: u32,
    next: Node, // compile error: recursive type `Node` has infinite size
}

We can solve that by either using reference (and playing with lifetimes) or by using heap-based data structure for next like smart pointer Rc/Arc or Vec:

pub struct TreeNode {
    id: u32,
    next: Vec<Node>, // for trees, we can express single ownership: parent owns children
}

pub struct RcGraphNode {
    id: u32,
    next: Vec<Rc<RcGraphNode>>, // for graphs, we can express shared ownership e.g. with `Rc`
}

pub struct RefGraphNode<'a> {
    id: u32,
    next: Vec<&'a RefGraphNode<'a>>, // alternatively, we can just use references and let somebody else 
}

Let’s write depth-first search (DFS) and breadth-first search (BFS) algorithms using Rc and references.

DFS/BFS with referance based graph

Basic graph structs looks like:

use std::collections::{HashSet, VecDeque};
use std::slice::Iter;

#[derive(Debug, Eq, PartialEq, Hash)]
pub struct Edge<'a> {
    pub weight: u32,
    pub destination_node: &'a Node<'a>,
}

impl<'a> Edge<'a> {
    pub fn new(weight: u32, destination_node: &'a Node<'a>) -> Edge {
        Edge {
            weight,
            destination_node,
        }
    }
}

// Should use better `PartialEq` and `Hash` implementation (id based only)
#[derive(Debug, Eq, PartialEq, Hash)]
pub struct Node<'a> {
    pub id: u32,
    pub children: Vec<Edge<'a>>,
}

impl<'a> Node<'a> {
    pub fn new(id: u32, children: Vec<Edge<'a>>) -> Node {
        Node { id, children }
    }

    pub fn children_edges_iter(&self) -> Iter<'_, Edge<'a>> {
        self.children.iter()
    }
}

Note that, weight is not needed in DFS and BFS but could be useful for other algorithms like Dijkstra’s shortest path.

As DFS and BFS are basically the same algorithms (the only difference is a queue that we use) we can define some trait to represent a common interface:

pub trait NodeQueue<'a> {
    fn enqueue(&mut self, node: &'a Node<'a>);
    fn dequeue(&mut self) -> Option<&'a Node<'a>>;
}

#[derive(Default)]
struct FifoNodeQueue<'a> {
    data: VecDeque<&'a Node<'a>>,
}

impl<'a> NodeQueue<'a> for FifoNodeQueue<'a> {
    fn enqueue(&mut self, node: &'a Node<'a>) {
        self.data.push_back(node)
    }

    fn dequeue(&mut self) -> Option<&'a Node<'a>> {
        self.data.pop_front()
    }
}

#[derive(Default)]
struct LifoNodeQueue<'a> {
    data: VecDeque<&'a Node<'a>>,
}

impl<'a> NodeQueue<'a> for LifoNodeQueue<'a> {
    fn enqueue(&mut self, node: &'a Node<'a>) {
        self.data.push_back(node)
    }

    fn dequeue(&mut self) -> Option<&'a Node<'a>> {
        self.data.pop_back()
    }
}

Now, we can implement a generic algorithm that traversal all nodes from the start node respecting queue order:

struct Graph {}

impl Graph {
    pub fn dfs<F>(&self, start: &Node, f: F)
    where
        F: Fn(&Node),
    {
        self.iterate_all(start, LifoNodeQueue::default(), f)
    }

    pub fn bfs<F>(&self, start: &Node, f: F)
    where
        F: Fn(&Node),
    {
        self.iterate_all(start, FifoNodeQueue::default(), f)
    }

    fn iterate_all<'a, F, Q>(&self, start: &'a Node, mut to_visit: Q, f: F)
    where
        F: Fn(&Node),
        Q: NodeQueue<'a>,
    {
        let mut already_visited: HashSet<&Node> = HashSet::new();

        to_visit.enqueue(start);
        already_visited.insert(start);

        while let Some(current_node) = to_visit.dequeue() {
            f(&current_node);

            for edge in current_node.children_edges_iter() {
                let child = edge.destination_node;
                if already_visited.insert(child) {
                    to_visit.enqueue(child);
                }
            }
        }
    }
}

F is a function that acts as a simple node visitor pattern.

Worth noticing is that, we use references to store both: visited nodes in HashSet and nodes to visit in VecDeque (bidirectional queue implemented as ring-buffer based on Vec). So we have zero-copy overhead there (just maintenance of data structure).

In a simple test, we can express ownership using individual variables or constants. In real use case, we would have to store it somewhere e.g. in HashMap or Vec:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn it_works() {
        let n5 = Node::new(5, vec![]);
        let n4 = Node::new(4, vec![Edge::new(1, &n5)]);
        let n3 = Node::new(3, vec![Edge::new(1, &n4), Edge::new(1, &n5)]);
        let n2 = Node::new(2, vec![Edge::new(1, &n4), Edge::new(1, &n3)]);
        let n1 = Node::new(1, vec![Edge::new(1, &n2), Edge::new(1, &n3)]);

        let graph = Graph {};

        println!("DFS");
        graph.dfs(&n1, |node| println!("Visited {:?}", node));

        println!("BFS");
        graph.bfs(&n1, |node| println!("Visited {:?}", node));
    }
}

DFS/BFS with `Rc` based graph

#[derive(Debug, Eq, PartialEq, Hash)]
pub struct Edge {
    pub weight: u32,
    pub destination_node: Rc<Node>,
}

#[derive(Debug, Eq, PartialEq, Hash)]
pub struct Node {
    pub id: u32,
    pub children: Vec<Edge>,
}

// `Node`, `Edge` - simple impl
// `NodeQueue` impl is the same if we want to keep it ref based,
// otherwise we just need to clone Rc..

struct Graph {}

impl Graph {
    pub fn dfs<F>(&self, start: &Node, f: F)
    where
        F: Fn(&Node),
    {
        self.iterate_all(start, LifoNodeQueue::default(), f)
    }

    pub fn bfs<F>(&self, start: &Node, f: F)
    where
        F: Fn(&Node),
    {
        self.iterate_all(start, FifoNodeQueue::default(), f)
    }

    fn iterate_all<'a, F, Q>(&self, start: &'a Node, mut to_visit: Q, f: F)
    where
        F: Fn(&Node),
        Q: NodeQueue<'a>,
    {
        let mut already_visited: HashSet<&Node> = HashSet::new();

        to_visit.enqueue(start);
        // or `to_visit.enqueue(start.clone());` if we want to keep Rc<Node> in queue
        already_visited.insert(start);

        while let Some(current_node) = to_visit.dequeue() {
            f(&current_node);

            for edge in current_node.children_edges_iter() {
                let child = edge.destination_node.as_ref();
                // or use `clone()` if we want to keep Rc<Node> in queue/set
                if already_visited.insert(child) {
                    to_visit.enqueue(child);
                }
            }
        }
    }
}

Test setup looks similar but expresses shared ownership:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn it_works() {
        let n5 = Rc::new(Node::new(5, vec![]));
        let n4 = Rc::new(Node::new(4, vec![Edge::new(1, n5.clone())]));
        let n3 = Rc::new(Node::new(
            3,
            vec![Edge::new(1, n4.clone()), Edge::new(1, n5.clone())],
        ));
        let n2 = Rc::new(Node::new(
            2,
            vec![Edge::new(1, n4.clone()), Edge::new(1, n3.clone())],
        ));
        let n1 = Rc::new(Node::new(
            1,
            vec![Edge::new(1, n2.clone()), Edge::new(1, n3.clone())],
        ));

        let graph = Graph {};

        println!("DFS");
        graph.dfs(&n1, |node| println!("Visited {:?}", node));

        println!("BFS");
        graph.bfs(&n1, |node| println!("Visited {:?}", node));
    }
}

Summary

We can use both styles to write graph-based algorithms. Playing with lifetimes in graphs nodes is a little bit annoying and Rc could be a good tradeoff here.

For a full example see: https://github.com/Tuczi/rust-graphs

Article by Tomasz Kuczma

Software engineer with a passion. Interested in computer networks and large-scale distributed computing. He loves to optimize and simplify software on various levels of abstraction starting from memory ordering through non-blocking algorithms up to system design and end-user experience. Geek. Linux user.

The views I express are my alone and they do not necessarily express the views of my employer or ex-employers. They are not investment advice nor based on any non-public information of any kind. Poglądy, które wyrażam, są tylko moje i niekoniecznie wyrażają opinie mojego pracodawcy lub byłych pracodawców. Nie są poradami inwestycyjnymi ani nie opierają się na jakichkolwiek niepublicznych informacjach.

Rust - single ownership and self reference

DFS/BFS with referance based graph

DFS/BFS with Rc based graph

Summary

DFS/BFS with `Rc` based graph