A* Pathfinding with Softmax

Using softmax, A* search can generate diverse candidates for evaluation. Code on GitHub.

Motivation

With deterministic pathfinding, you can find an optimal path, provided the route's total cost is the sum of the cost of its edges.

But suppose you can't evaluate a route's preferability without a holistic view.

This resembles LLM text generation, in which tokens are selected step-by-step, but generations are best judged in whole. With test-time compute solutions, numerous candidates are compared against each other.

If we can generate diverse path candidates, we can use test-time compute in pathfinding.

Strategy

A* search is a pathfinding algorithm developed for Shakey the Robot and published in 1968. Today, it's everyone's first routing algorithm thanks to its popularity in game development tutorials.

The Softmax function from Boltzmann's 1868 paper gives us an easy way to turn logits into probability distributions. Today we find it in ChatGPT and the like as the final activation function, letting us pick tokens semi-randomly for diverse generations.

Let's combine them.

Classic A*

What's important about A* here is you have a cost function, and a heuristic.

The heuristic should be commensurate with the cost function, and should never over estimate the cost.

Often, this means euclidean distance of an edge for cost, and euclidean distance from a candidate to the goal as a heuristic. You can never really go "straight" to your target, so the heuristic isn't realistic, but you can never do better, so it never over-estimates.

You'll need to try it to "get" it, but for reference, I've paraphrased Wikipedia's pseudocode here.

openSet := {start}
cameFrom := an empty map

gScore := map with high default
gScore[start] := 0

fScore := map
fScore[start] := h(start)

while openSet is not empty
  here := node from openSet with lowest fScore
  if here = goal
    # Done 

  openSet.Remove(here)
  for each neighbor of here 
    tentative_gScore :=
      gScore[here] + d(here, neighbor)
    if tentative_gScore < gScore[neighbor]
      cameFrom[neighbor] := here 
      gScore[neighbor] := tentative_gScore
      fScore[neighbor] :=
        tentative_gScore + h(neighbor)
      if neighbor not in openSet
        openSet.add(neighbor)

Easy to use. Makes a nice path:

A graph of nodes with edges connecting between them, in which a single path is illuminated from one end to the other. — A* gives a clean, deterministic path.

The problem, is "cost" is assumed to be the sum of the costs of each leg of the journey. If we need a "click to go" interface in Civilization, that's fine, but in real world scenarios, earlier legs of a journey can effect later legs in suprising ways, and we can't estimate the cost of a route unless we have it in entirely in view.

If we can produce diverse, plausible candidates, we can evaluate as many as we like.

Changing Stuff

I wasn't sure where I could introduce a "sampling" concept, but decided to try the first line of the loop body.

The first line reads, here := node from openSet with lowest fScore. That smells like a distribution we can sample from. Generally, it's the accumulated cost from the origin to our jumping-off point, plus the heuristic, and lower is better.

Suppose we have three points in the open set scored [5, 9, 25]. In A* we'd treat that five as the best lead. Instead, I'd like to select the lead randomly, but prefer that five.

Softmax will handle negatives, so in principle you can just flip them to [-5, -9, -25], making that first figure the "highest". But the distribution's very sharp. In practice, I found it put 100% odds on the first item, every time.

Next, I tried a simple division. 5 divided by 5 is one, and 5 divided by any of the other numbers is a lower figure in range [0,1). This guarantees somewhat "flat" figures, which we can bend up and down with a temperature if we like.

scores := open set relevant scores
logits := min(scores) / scores
logits := logits / temperature
probs := softmax(logits)
index := draw(probs)
here := node for the given index

Updated pseudocode for the first step in the while loop.

You can see complete demo code on GitHub in the form of a Python Jupyter Notebook.

Heat Maps

This will give a different route each time, so I've run generations in batches and render heat maps of edge traversal. Let's have a look with different temperatures.

A graph showing a low-temperature heat map, in which most routes travel the same course. — 128 paths, temperature 0.1. Paths mostly stick to the optimal course.

A graph showing a higher heat map. Routes scatter more. — 128 paths, temperature 0.5. Paths begin to scatter.

With a further increase to the temperature, routes scatter even more. — 128 paths, temperature 1.0. Even very roundabout courses are tried occasionally.

Conclusion

Honestly, other than a little debugging, it worked exactly as I hoped.

I've got a hunch I'll find a more elegant solution as I go over other common pathfinding algorithms. Can Dijkstra give cheap multiple candidates? When I find out, I'll write further.