Rust - Simple graphs algorithms
By Tomasz Kuczma
Graphs (and trees - their special case) are data structures used in modeling complex problems such as finding an optimal route, exploring possible moves in a game, caching engines, modeling relations and more . That’s why they are very often the fundament of software engineer interview tasks in big companies - FAANG. Let’s implement some basic graph algorithms in Rust.
Rust - single ownership and self reference
Rust is a great programming language that achieves memory safety by forcing a single ownership model and borrow-checker mechanism.
That is actually problematic for data structures that require referencing Self
type e.g. graphs, trees, and double-linked lists.
So it is impossible in Rust to express this:
pub struct Node {
id: u32,
next: Node, // compile error: recursive type `Node` has infinite size
}
We can solve that by either using reference (and playing with lifetimes) or by using heap-based data structure for next
like smart pointer Rc
/Arc
or Vec
:
pub struct TreeNode {
id: u32,
next: Vec<Node>, // for trees, we can express single ownership: parent owns children
}
pub struct RcGraphNode {
id: u32,
next: Vec<Rc<RcGraphNode>>, // for graphs, we can express shared ownership e.g. with `Rc`
}
pub struct RefGraphNode<'a> {
id: u32,
next: Vec<&'a RefGraphNode<'a>>, // alternatively, we can just use references and let somebody else
}
Let’s write depth-first search (DFS) and breadth-first search (BFS) algorithms using Rc
and references.
DFS/BFS with referance based graph
Basic graph struct
s looks like:
use std::collections::{HashSet, VecDeque};
use std::slice::Iter;
#[derive(Debug, Eq, PartialEq, Hash)]
pub struct Edge<'a> {
pub weight: u32,
pub destination_node: &'a Node<'a>,
}
impl<'a> Edge<'a> {
pub fn new(weight: u32, destination_node: &'a Node<'a>) -> Edge {
Edge {
weight,
destination_node,
}
}
}
// Should use better `PartialEq` and `Hash` implementation (id based only)
#[derive(Debug, Eq, PartialEq, Hash)]
pub struct Node<'a> {
pub id: u32,
pub children: Vec<Edge<'a>>,
}
impl<'a> Node<'a> {
pub fn new(id: u32, children: Vec<Edge<'a>>) -> Node {
Node { id, children }
}
pub fn children_edges_iter(&self) -> Iter<'_, Edge<'a>> {
self.children.iter()
}
}
Note that, weight is not needed in DFS and BFS but could be useful for other algorithms like Dijkstra’s shortest path.
As DFS and BFS are basically the same algorithms (the only difference is a queue that we use) we can define some trait
to represent a common interface:
pub trait NodeQueue<'a> {
fn enqueue(&mut self, node: &'a Node<'a>);
fn dequeue(&mut self) -> Option<&'a Node<'a>>;
}
#[derive(Default)]
struct FifoNodeQueue<'a> {
data: VecDeque<&'a Node<'a>>,
}
impl<'a> NodeQueue<'a> for FifoNodeQueue<'a> {
fn enqueue(&mut self, node: &'a Node<'a>) {
self.data.push_back(node)
}
fn dequeue(&mut self) -> Option<&'a Node<'a>> {
self.data.pop_front()
}
}
#[derive(Default)]
struct LifoNodeQueue<'a> {
data: VecDeque<&'a Node<'a>>,
}
impl<'a> NodeQueue<'a> for LifoNodeQueue<'a> {
fn enqueue(&mut self, node: &'a Node<'a>) {
self.data.push_back(node)
}
fn dequeue(&mut self) -> Option<&'a Node<'a>> {
self.data.pop_back()
}
}
Now, we can implement a generic algorithm that traversal all nodes from the start
node respecting queue order:
struct Graph {}
impl Graph {
pub fn dfs<F>(&self, start: &Node, f: F)
where
F: Fn(&Node),
{
self.iterate_all(start, LifoNodeQueue::default(), f)
}
pub fn bfs<F>(&self, start: &Node, f: F)
where
F: Fn(&Node),
{
self.iterate_all(start, FifoNodeQueue::default(), f)
}
fn iterate_all<'a, F, Q>(&self, start: &'a Node, mut to_visit: Q, f: F)
where
F: Fn(&Node),
Q: NodeQueue<'a>,
{
let mut already_visited: HashSet<&Node> = HashSet::new();
to_visit.enqueue(start);
already_visited.insert(start);
while let Some(current_node) = to_visit.dequeue() {
f(¤t_node);
for edge in current_node.children_edges_iter() {
let child = edge.destination_node;
if already_visited.insert(child) {
to_visit.enqueue(child);
}
}
}
}
}
F
is a function that acts as a simple node visitor pattern.
Worth noticing is that, we use references to store both: visited nodes in HashSet
and nodes to visit in VecDeque
(bidirectional queue implemented as ring-buffer based on Vec
). So we have zero-copy overhead there (just maintenance of data structure).
In a simple test, we can express ownership using individual variables or constants. In real use case, we would have to store it somewhere e.g. in HashMap
or Vec
:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() {
let n5 = Node::new(5, vec![]);
let n4 = Node::new(4, vec![Edge::new(1, &n5)]);
let n3 = Node::new(3, vec![Edge::new(1, &n4), Edge::new(1, &n5)]);
let n2 = Node::new(2, vec![Edge::new(1, &n4), Edge::new(1, &n3)]);
let n1 = Node::new(1, vec![Edge::new(1, &n2), Edge::new(1, &n3)]);
let graph = Graph {};
println!("DFS");
graph.dfs(&n1, |node| println!("Visited {:?}", node));
println!("BFS");
graph.bfs(&n1, |node| println!("Visited {:?}", node));
}
}
DFS/BFS with Rc
based graph
#[derive(Debug, Eq, PartialEq, Hash)]
pub struct Edge {
pub weight: u32,
pub destination_node: Rc<Node>,
}
#[derive(Debug, Eq, PartialEq, Hash)]
pub struct Node {
pub id: u32,
pub children: Vec<Edge>,
}
// `Node`, `Edge` - simple impl
// `NodeQueue` impl is the same if we want to keep it ref based,
// otherwise we just need to clone Rc..
struct Graph {}
impl Graph {
pub fn dfs<F>(&self, start: &Node, f: F)
where
F: Fn(&Node),
{
self.iterate_all(start, LifoNodeQueue::default(), f)
}
pub fn bfs<F>(&self, start: &Node, f: F)
where
F: Fn(&Node),
{
self.iterate_all(start, FifoNodeQueue::default(), f)
}
fn iterate_all<'a, F, Q>(&self, start: &'a Node, mut to_visit: Q, f: F)
where
F: Fn(&Node),
Q: NodeQueue<'a>,
{
let mut already_visited: HashSet<&Node> = HashSet::new();
to_visit.enqueue(start);
// or `to_visit.enqueue(start.clone());` if we want to keep Rc<Node> in queue
already_visited.insert(start);
while let Some(current_node) = to_visit.dequeue() {
f(¤t_node);
for edge in current_node.children_edges_iter() {
let child = edge.destination_node.as_ref();
// or use `clone()` if we want to keep Rc<Node> in queue/set
if already_visited.insert(child) {
to_visit.enqueue(child);
}
}
}
}
}
Test setup looks similar but expresses shared ownership:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() {
let n5 = Rc::new(Node::new(5, vec![]));
let n4 = Rc::new(Node::new(4, vec![Edge::new(1, n5.clone())]));
let n3 = Rc::new(Node::new(
3,
vec![Edge::new(1, n4.clone()), Edge::new(1, n5.clone())],
));
let n2 = Rc::new(Node::new(
2,
vec![Edge::new(1, n4.clone()), Edge::new(1, n3.clone())],
));
let n1 = Rc::new(Node::new(
1,
vec![Edge::new(1, n2.clone()), Edge::new(1, n3.clone())],
));
let graph = Graph {};
println!("DFS");
graph.dfs(&n1, |node| println!("Visited {:?}", node));
println!("BFS");
graph.bfs(&n1, |node| println!("Visited {:?}", node));
}
}
Summary
We can use both styles to write graph-based algorithms.
Playing with lifetimes in graphs nodes is a little bit annoying and Rc
could be a good tradeoff here.
For a full example see: https://github.com/Tuczi/rust-graphs
Software engineer with a passion. Interested in computer networks and large-scale distributed computing. He loves to optimize and simplify software on various levels of abstraction starting from memory ordering through non-blocking algorithms up to system design and end-user experience. Geek. Linux user.