Taxman Programming Project: Model Solution

Description of the Algorithm

At each stage look at the largest number p that has exactly two factors left. If no number has exactly two factors left, look at the largest number that has exactly three factors left, and so on. If there is no number with at least two factors left, then we are done.

Now p is usually the best number to take, but there is one situation when it is not. This is when there is a proper multiple q of p where the only proper factors left for q are factors of p AND there are no proper multiples of q left. In that case, removing p would strand q, leaving it for the taxman at the end, so the best thing to do is to take q yourself. Naturally, if there are several such proper multiples q available, then always take the biggest one!

Implementation Details

Firstly, to make it efficient to find the factors of a number, we precompute an array factor[i] containing the highest prime factor of i.

We also maintain an array number[i] containing the number of factors left for i.

At each stage we have a total order defined on the set of numbers from 1 to n, like this:

p is more desirable than q if:

Note that the array number[i] makes this order very easy to calculate.

Now we arrange our numbers 1 up to n in a binary heap, where the topmost element is the most desirable number. (Note that the definition of desirability means that the topmost number is always available for us to take right until the end.)

The algorithm now goes as follows:

Complexity Analysis

Setting up the arrays factor[i] and number[i] can be done in time O(n log n), using the following sieving technique:
  for (i = 1; i <= n; i = i + 1) {
    factor[i] = 1;
  }

  for (i = 2; i <= n; i = i + 1) {
    if (factor[i] == 1) {
      for (j = i; j <= n; j = j + i) {
        factor[j] = i;
      }
    }
  }
This takes time n + n/2 + n/3 + n/5 + ... + n/n < n(1 + 1/2 + 1/3 + 1/4 + ... + 1/n) = O(n log n).

Heapifying takes O(n).

Now whenever we remove a number p (for whatever reason) we must reduce number[q] by 1 for every multiple q of p. By a similar analysis as for the seiving, the number of reductions we must perform is O(n log n). In addition, every time this is done we may need to move i up or down the heap some places. In the worst case, O(log n) swaps may be required. Therefore, the total time required by the main body of the algorithm is O(n (log n)2).

The complexity of the whole algorithm is O(n (log n)2).

Performance Information

All times are measured in seconds, running on a Pentium II 200MHz and using the java virtual machine that comes with version 4.76 of the Netscape browser.

n score time
100.009
220.038
590.155
10400.766
201210.870
508001.054
10031482.157
200126144.262
500782840.267
10003123500.451
200012426010.951
500077642572.693
10000310189365.539
2000012418289112.106
5000077467769632.596
100000310157541770.231
20000012397258724147.388
50000077385834248404.655
1000000309391446240859.213