A* Pathfinding with Softmax
Using softmax, A* search can generate diverse candidates for evaluation. Code on GitHub.
Motivation
With deterministic pathfinding, you can find an optimal path, provided the route's total cost is the sum of the cost of its edges.
But suppose you can't evaluate a route's preferability without a holistic view.
This resembles LLM text generation, in which tokens are selected step-by-step, but generations are best judged in whole. With test-time compute solutions, numerous candidates are compared against each other.
If we can generate diverse path candidates, we can use test-time compute in pathfinding.
Strategy
A* is a routing algorithm developed for Shakey the Robot and published in 1968. Today, it's everyone's first routing algorithm thanks to its popularity in game development tutorials.
The Softmax function from Boltzmann's 1868 paper gives us an easy way to turn logits into probability distributions. Today we find it in ChatGPT and the like as the final activation function, letting us pick tokens semi-randomly for diverse generations.
Let's combine them.
Classic A*
What's important about A* here is you have a cost function, and a heuristic.
The heuristic should be commensurate with the cost function, and should never over estimate the cost.
Often, this means euclidean distance of an edge for cost, and euclidean distance from a candidate to the goal as a heuristic. You can never really go "straight" to your target, so the heuristic isn't realistic, but you can never do better, so it never over-estimates.
You'll need to try it to "get" it, but for reference, I've paraphrased Wikipedia's pseudocode here.
openSet := {start}
cameFrom := an empty map
gScore := map with high default
gScore[start] := 0
fScore := map
fScore[start] := h(start)
while openSet is not empty
here := node from openSet with lowest fScore
if here = goal
# Done
openSet.Remove(here)
for each neighbor of here
tentative_gScore :=
gScore[here] + d(here, neighbor)
if tentative_gScore < gScore[neighbor]
cameFrom[neighbor] := here
gScore[neighbor] := tentative_gScore
fScore[neighbor] :=
tentative_gScore + h(neighbor)
if neighbor not in openSet
openSet.add(neighbor)
Easy to use, and when put to work it makes a nice path:

The problem, though, is "cost" is assumed to be the sum of the costs of each leg of the journey. If I need a "click to go" interface in Civilization, that's fine, but in real world scenarios, earlier "legs" of a journey can effect later "legs" in suprising ways, and we can't estimate the cost of a route unless we have it in entirely in view.
But if we can produce diverse, plausible candidates, we can evaluate as many as we like.
Changing Stuff
At first glance, I wasn't sure where I could introduce a "sampling" concept. I decided to try the first line of the loop body.
The first line reads, here := node from openSet with lowest fScore
. That smells like a distribution we can sample from. Generally, it's the accumulated cost from the origin to our jumping-off point, plus the heuristic, and lower is better.
Suppose we have three points in the open set scored 5, 9 and 25, in classical A* we'll go for that five. Instead, I'd like to usually go for the five, and sometimes go for the others.
Softmax will handle negatives, and in principle you can just try to flip them; -5, -9 and -25 makes that first figure the "highest". But the distribution's so sharp, I found it put 100% odds on the first item, every time.
Next, I tried a simple division. 5 divided by 5 is one, and 5 divided by any of the other numbers is a lower figure in range [0,1). This guarantees somewhat "flat" logits we can later bend up and down with a temperature if we like.
scores := open set relevant scores
logits := min(scores) / scores
logits := logits / temperature
probs := softmax(logits)
index := sample(probs)
here := node for the given index
Actual code's found on GitHub in the form of a Python Jupyter Notebook.
Heat Maps
Since this will give us a different route every time, I decided to run generations in batches and draw heat maps showing which edges are most traversed. This worked out beautifully and the algo seems to work exactly as I hoped.
Honestly, other than a little debugging, it worked exactly as I hope.
Conclusion
I can produce as many variations of a path as I'd like. Big plans for this.
I've got a hunch I'll find a more elegant solution as I go over other common pathfinding algorithms. Can Dijkstra give cheap multiple candidates? Not really sure yet. But when I find out, I'll write further.