A human-driven sort algorithm (MonkeySort)

June 19, 2012

Summary: I made a utility that sorts a provided list of items, asking the user to pick the best item out of two.

Every once in a while you have to arrange certain items into a prioritised list: maybe choose projects you’re going to focus on, or movies to watch, or places to visit, et cetera. Maybe you don’t need the sorted list itself but the top (three, five) items from the list; this is essentially the same task.

The problem is, while you can easily choose the best option out of two (or conclude that they’re equally good), choosing from a lot of options induces decision paralysis.

So I thought, practically all of computer-based sorting algorithms are based on two-option comparisons. Why not make a utility that uses a well-known sorting algorithm, driven by real human decisions?

Demo

There is a live demo available on http://leonid-shevtsov.github.com/monkeysort/.

Implementation

The algorithm could be divided into two parts: the actual sorting (which is a modified quicksort), and storing and updating the comparison matrix (which is the fun part).

I picked quicksort as the algorithm, for its (usually) low number of iterations, and thus comparisons. It had to be reworked for the case that the result of a comparison is not known yet (and has to be decided by the user).

Async, “interruptible” quick sort

At first I decided to make an asynchronous version of quick sort, breaking it up where there’s a need for a comparison. I even got the algorithm coded (which deserves an article of its own), but it turned out to be an unnecessarily complex solution, because I found a really simple brute force one.

Brute force, “restartable” quick sort

If you think about it, quick sort runs near-instantaneously for the number of items that you’d sort manually. So I threw out the async version and made the simplest quick sort implementation which bails out as soon as it hits an unknown comparison. After the user makes the decision the sort restarts from the beginning, and so forth until the entire set is sorted. The actual sorting produces no noticeable delay.

Storing comparisons

It’s convenient to store comparison results in a matrix (it’s not going to be big because the item set is not too big). Element C[i,j] of that matrix stores the comparison result between A[i] and A[j], and can be “better”, “worse”, “equal” or “unknown”.

Initially the matrix is initialized with “unknown” values, except for the main diagonal which is initialized with “equal”.

Processing a comparison result

So the user made a choice and said that A[i] is “better” than A[j]. How do you store that?

  • Obviously, set Cij to “better”.
  • Also set Cji to the opposite (“worse”).

    This is very important; not only because asking the user about A[j] vs A[i] is excessive, but because it may lead to a logical contradiction. After each explicit comparison result is known, we must also store every implicitly known result.

  • Since comparisons are transitive, we actually get more implied data from the result; for each item A[k]:

    • (Cij ∈ { “<”, “>”}) ⋀ (Cjk ∈ {Cij, “=”}) ⇒ Cik = Cij

    • ((Cij = “=”) ⋀ (Cjk is known)) ⇒ Cik = Cjk

    OK, in theory the comparison operator is not necessarily transitive. But we assume that the set of items is totally ordered, because otherwise sorting does not make practical sense.

    Note that all of the implied results can lead to transitive implications of their own, so we recursively process them, too.

  • To avoid going into an infinite loop, we avoid processing elements that already have a value.

Post mortem

After coding the whole thing, I found out that there’s another way of approaching the problem. To sort the set, we need to fill in the comparison table. It may well be that using quicksort to select comparisons won’t produce the minimal number of explicit comparisons required. Maybe you can make a better algorithm based on filling in the matrix, and then run a regular sort.

Question: what is the minimal number of explicit comparisons required to fill in such a comparison table?

Ideas: dynamic programming? graph algorithms?

Two comments. Post another one
  1. 4adeef4094cef099575b60cec053d382 # On June 19, 2012 Patrick Stein (nklein.com) wrote:

    If the goal is just to pick the favorite, you don’t need to fill in very much of the table at all. You should only need log_2(n) comparisons to run a tournament.

    If you want to sort the whole list, then I’d think you could take the best advantage of the transitivity with an insertion-sort type approach. Assume the first thing is already in the list. Then, when adding a new thing, check to see if it is better or worse than the middle thing in the list. If it’s better, check to see if it’s better or worse than the things that were better already. I think this is O(n lg n). But, it doesn’t deal with “don’t care” as an option.

    Also, I just tried the live demo. I swear that I said pears were better than cherries, but pears came up number 3 to cherries 2.

  2. 8dd9fa632ca161d0ca1929a4d99cbe77 # On June 19, 2012 Keith wrote:

    Knuth describes a minimal comparison sort. If I remember correctly, it’s insertion sort with a binary search to determine where to insert.

(looking for markup?)

  • **bold**
  • > quote

cancel