🐪

Implementing a binary heap in Ocaml

I didn’t come from a “traditional” software engineering background. I’ve never received formal training in anything software related.

This becomes particularly apparent when it comes to algorithms and datastructures. A big scary, meaty topic. And I’m a vegetarian!

My typical flow for learning concepts in this crucial area of software engineering goes a little something like this:

Conor reads about something he’s never seen before
Conor imagines how such a concept could be useful in his own projects
Conor scratches his head at the black magic one would have to perform to achieve this concept in code

If I’m lucky at this point something will click and its just a case of using the newly discovered knowledge enough times that it doesn’t fade by breakfast the next day.

Algorithms and data structures have historically been a gap in my knowledge

Enter the binary heap

I recently came across the Binary Heap data structure when perusing the code for TinyVector. Its a Rust based project, and it was calling the standard library BinaryHeap type to efficiently order a set of items by priority.

Seeing this heap in the wild got me to thinking, “what the devil is this thing and how might I use it myself”.

The interweb has some pretty good resources on the what of binary heaps which really helped:

Wikipedia entry
Excellent Youtube video describing what a Binary heap is and how it functions
A great Algoviz visualisation

Let’s get implementing

I picked OCaml to implement my own binary heap, as its a language I’ve wanted to sink deeper into for a good while.

Its functional and strongly typed, but the typing feels very implicit and doesn’t require much manual defining.

Let’s start with our basic binary_tree type

type 'a binary_tree = 
  | Empty 
  | Node of 'a * 'a binary_tree * 'a binary_tree;;

What is this saying?

Well, a binary_tree with generic 'a type can be either Empty or a Node.

Each Node then contains 3 elements/arguments these are separated by * characters in the OCaml syntax;

An 'a (that’s our generic “anything” value which the tree wraps)
A left binary tree (of our generic type)
A right binary tree

So its a type which calls itself recursively. Nice!

Inserting values into the tree

To add values into our heap, we need a way of prioritising them. Since in my head the value should be super generic - able to take any kind of data to be sorted into our heap, I thought a great structure for each item would be a tuple of (priority, value). That way priorities can be simple int types which are easy to compare etc.

Lets write a single insert function which takes a heap and adds a Node containing the given value.

import Note from ’../../components/markdown/Note.astro’

In OCaml you can immediately pattern match on the final argument of the function.

This means that instead of this:

let foo value =
 match value with
 | Bar -> "this is ok"
 | Baz -> "have to repeat the value twice tho!"

you can write this

let foo = function 
| Bar -> "wow right?"
| Baz -> "check that brevity!"

This style of function just feels so magical!

Our insert is going to take a tuple of the priority and the actual value, and then immediately pattern match on the provided Heap, since the behaviour of insert is very much dependent on the current shape of the Heap.

Also it will need to be marked as recursive, so that it can call itself!

let rec insert (priority, value) = function

What shapes can the heap take?

Well it could either be Empty or a Node, so these are our pattern matched options

|  Empty -> Node ((priority, value), Empty, Empty)
|  Node ((p, v), left, right) ->

When Empty, we just create the first Node in the heap, with Empty left and right leaf nodes.

If there is a Node (that will be the head of the heap), then we destructure whats inside giving us access to the contained (priority, value) tuple, as well as the child leaf nodes.

Its then a case of comparing the priorities of the top node with the arguments, if the new one is less than the current one (ie. we pass in a priority 1 and the current node is priority 2, 1 is more important) we swap them, and recursively call our insert function on one of the leaf nodes. Otherwise we do the opposite, recursively insert using the argument value.

if priority < p then 
    Node ((priority, value), insert (p, v) right, left)
else 
    Node ((p, v), insert (priority, value) right, left)

Pulling everything together, here’s what we’ve got:

let rec insert (priority, value) = function
|  Empty -> Node ((priority, value), Empty, Empty)
|  Node ((p, v), left, right) -> 
  if priority < p then 
    Node ((priority, value), insert (p, v) right, left)
  else 
    Node ((p, v), insert (priority, value) right, left);;

Peeking the top value

In a binary tree (or priority queue), we are interested only in the value of the top Node.

With a bit of pattern matching magic we can pluck out the value the Node contains

Remember that a Node contains 3 arguments

The value the Node holds

The left Node child

The right Node child

let peek = function
  | Empty -> None
  | Node ((_, value), _, _) -> Some (value);;

Popping off the top

Queue the reshuffle!

To remove items from Binary Heaps you start at the first node, the top of the tree, and pop it off leaving a headless tree of nodes. We then pick the highest value child node and promote it to be the new head of the tree.

This keeps the top item as the highest priority item.

Here’s one way you might implement it in our OCaml module.

let rec pop = function 
  | Empty -> Empty
  | Node (_, left, right) -> 
      match (left, right) with
      | (Empty, _) -> right
      | (_, Empty) -> left
      | (Node ((pl, vl), _, _), Node((pr, vr), _, _)) ->
          if pl > pr then 
            Node ((pl, vl), pop left, right)
          else 
            Node ((pr, vr), left, pop right);;

Whoa right?

Just break it down line by line and its less scary, and all just about dipping in and destructuring the required values to determine which is the highest priority, and then popping again on the remaining leaf nodes.

Testing in the REPL

OCaml has the utop tool for running code in a REPL style. You can import your own modules using the #use "myfile.ml" expression.

Let’s fire up the utop REPL and create a binary heap.

let heap =
 Empty
 |> insert (2, "second one")
 |> insert (1, "should be top")
 |> insert (3, "lesser")
 |> insert (8, "way down")
 |> insert (2, "another 2?!") ;;

This will print out the Node tree looking similar to this

val heap : (int * string) binary_tree =
  Node ((1, "should be top"),
   Node ((2, "another 2?!"), Node ((3, "lesser"), Empty, Empty), Empty),
   Node ((2, "second one"), Node ((8, "way down"), Empty, Empty), Empty))

Now we can test our peek function

peek heap;;
(* 
- : string option = Some "should be top"
*)

We can see that the highest priority is indeed on top!

Wrapping our datastructure in a module

In OCaml, you can be very specific about types and functions a module exports. These definitions are placed in a .mli interface file (named the same as your module).

(* binary_heap.mli *)
type 'a binary_tree

val insert : 'a * 'b -> ('a * 'b) binary_tree -> ('a * 'b) binary_tree

val peek : ('a * 'b) binary_tree -> 'b option

val pop : ('a * 'b) binary_tree -> ('a * 'b) binary_tree

Wait, no specific type definition for the binary_tree? Why don’t we use the type we specified earlier?

This is actually a really really cool OCaml feature. You can definite the types in a module’s interface by name only. This means that from the outside, calling code can only interact with your module via the functions which you provide it, but further than that, you could change the entire implementation of the module, including the main type which it revolves around, without any change to the caller.

It would be non breaking for us to force the priority field to be a vector rather than a number.

How crazy is that?!