# Implementing a binary heap in Ocaml

I didnāt come from a *ātraditionalā* software engineering background. Iāve never received formal training in anything software related.

This becomes particularly apparent when it comes to algorithms and datastructures. A big scary, meaty topic. And Iām a vegetarian!

My typical flow for learning concepts in this crucial area of software engineering goes a little something like this:

- Conor reads about something heās never seen before
- Conor imagines how such a concept could be useful in his own projects
- Conor scratches his head at the black magic one would have to perform to achieve this concept in code

If Iām lucky at this point something will click and its just a case of using the newly discovered knowledge enough times that it doesnāt fade by breakfast the next day.

Algorithms and data structures have historically been a gap in my knowledge

## Enter the binary heap

I recently came across the Binary Heap data structure when perusing the code for TinyVector. Its a Rust based project, and it was calling the standard library `BinaryHeap`

type to efficiently order a set of items by priority.

Seeing this heap in the wild got me to thinking, āwhat the devil is this thing and how might I use it myselfā.

The interweb has some pretty good resources on the what of binary heaps which really helped:

- Wikipedia entry
- Excellent Youtube video describing what a Binary heap is and how it functions
- A great Algoviz visualisation

## Letās get implementing

I picked OCaml to implement my own binary heap, as its a language Iāve wanted to sink deeper into for a good while.

Its functional and strongly typed, *but* the typing feels very implicit and doesnāt require much manual defining.

Letās start with our basic `binary_tree`

type

```
type 'a binary_tree =
| Empty
| Node of 'a * 'a binary_tree * 'a binary_tree;;
```

What is this saying?

Well, a `binary_tree`

with generic `'a`

type can be either
`Empty`

or a `Node`

.

Each `Node`

then contains 3 elements/arguments these are separated by `*`

characters in the OCaml syntax;

- An
`'a`

(thatās our generic āanythingā value which the tree wraps) - A left binary tree (of our generic type)
- A right binary tree

So its a type which calls itself recursively. Nice!

### Inserting values into the tree

To add values into our heap, we need a way of prioritising them. Since in my head the value should be super generic - able to take any kind of data to be sorted into our heap, I thought a great structure for each item would be a tuple of `(priority, value)`

. That way priorities can be simple `int`

types which are easy to compare etc.

Lets write a single `insert`

function which takes a heap and adds a `Node`

containing the given `value`

.

Our `insert`

is going to take a tuple of the `priority`

and the actual `value`

, and then immediately pattern match on the provided Heap, since the behaviour of insert is very much dependent on the current shape of the Heap.

Also it will need to be marked as recursive, so that it can call itself!

`let rec insert (priority, value) = function`

What shapes can the heap take?

Well it could either be `Empty`

or a `Node`

, so these are our pattern matched options

```
| Empty -> Node ((priority, value), Empty, Empty)
| Node ((p, v), left, right) ->
```

When `Empty`

, we just create the first `Node`

in the heap, with `Empty`

left and right leaf nodes.

If there is a `Node`

(that will be the head of the heap), then we destructure whats inside giving us access to the contained `(priority, value)`

tuple, as well as the child leaf nodes.

Its then a case of comparing the priorities of the top node with the arguments, if the new one is less than the current one (ie. we pass in a priority 1 and the current node is priority 2, 1 is more important) we swap them, and recursively call our insert function on one of the leaf nodes. Otherwise we do the opposite, recursively insert using the argument value.

```
if priority < p then
Node ((priority, value), insert (p, v) right, left)
else
Node ((p, v), insert (priority, value) right, left)
```

Pulling everything together, hereās what weāve got:

```
let rec insert (priority, value) = function
| Empty -> Node ((priority, value), Empty, Empty)
| Node ((p, v), left, right) ->
if priority < p then
Node ((priority, value), insert (p, v) right, left)
else
Node ((p, v), insert (priority, value) right, left);;
```

### Peeking the top value

In a binary tree (or priority queue), we are interested only in the value of the top Node.

With a bit of pattern matching magic we can pluck out the value the Node contains

Remember that a

`Node`

contains 3 arguments

- The value the Node holds
- The left Node child
- The right Node child

```
let peek = function
| Empty -> None
| Node ((_, value), _, _) -> Some (value);;
```

### Popping off the top

Queue the reshuffle!

To remove items from Binary Heaps you start at the first node, the top of the tree, and pop it off leaving a headless tree of nodes. We then pick the highest value child node and promote it to be the new head of the tree.

This keeps the top item as the highest priority item.

Hereās one way you might implement it in our OCaml module.

```
let rec pop = function
| Empty -> Empty
| Node (_, left, right) ->
match (left, right) with
| (Empty, _) -> right
| (_, Empty) -> left
| (Node ((pl, vl), _, _), Node((pr, vr), _, _)) ->
if pl > pr then
Node ((pl, vl), pop left, right)
else
Node ((pr, vr), left, pop right);;
```

Whoa right?

Just break it down line by line and its less scary, and all just about dipping in and destructuring the required values to determine which is the highest priority, and then popping again on the remaining leaf nodes.

## Testing in the REPL

OCaml has the `utop`

tool for running code in a REPL style. You can import your own modules using the `#use "myfile.ml"`

expression.

Letās fire up the `utop`

REPL and create a binary heap.

```
let heap =
Empty
|> insert (2, "second one")
|> insert (1, "should be top")
|> insert (3, "lesser")
|> insert (8, "way down")
|> insert (2, "another 2?!") ;;
```

This will print out the Node tree looking similar to this

```
val heap : (int * string) binary_tree =
Node ((1, "should be top"),
Node ((2, "another 2?!"), Node ((3, "lesser"), Empty, Empty), Empty),
Node ((2, "second one"), Node ((8, "way down"), Empty, Empty), Empty))
```

Now we can test our `peek`

function

```
peek heap;;
(*
- : string option = Some "should be top"
*)
```

We can see that the highest priority is indeed on top!

## Wrapping our datastructure in a module

In OCaml, you can be very specific about types and functions a module exports. These definitions are placed in a `.mli`

interface file (named the same as your module).

```
(* binary_heap.mli *)
type 'a binary_tree
val insert : 'a * 'b -> ('a * 'b) binary_tree -> ('a * 'b) binary_tree
val peek : ('a * 'b) binary_tree -> 'b option
val pop : ('a * 'b) binary_tree -> ('a * 'b) binary_tree
```

Wait, no specific type definition for the `binary_tree`

? Why donāt we use the type we specified earlier?

This is actually a really really cool OCaml feature. You can definite the types in a moduleās interface by name only. This means that from the outside, calling code can only interact with your module via the functions which you provide it, but further than that, you could change the entire implementation of the module, including the main type which it revolves around, without any change to the caller.

It would be non breaking for us to force the priority field to be a vector rather than a number.

How crazy is that?!