6/04/2008

Google Treasure Hunt 2008, 4th and last Puzzle

Two days ago, the 4th and last puzzle of GTH2008 was announced by Google, I suppose at 23:15 UTC/GMT.

I have to confess I didn't solve this problem in one shot. The first half of it was done before I got stoned of drowsiness! The second half was solved after a few hours of sleep. Shame on me... the problem was not difficult at all, and I'm not saying this because I have already solved it! It really was simple...

It can be said that the last puzzle could be, in my opinion, made up of two small puzzles:


  • first, how to deal with prime number generation and/or how to perform primality tests (quickly!)

  • second, how to sum the numbers correctly and find the smallest prime sum of consecutive prime numbers (quickly and hopefully without any mistake!)


I spent like 4 hours solving the whole puzzle in the first half but invested most of it in the first "small" puzzle. My solution for the first "small" puzzle was already fast enough for the problem at hand but I always got the program asking for **more and more** prime numbers... I was generating like millions and millions of prime numbers and even so it never seemed to find a result... My question was for the numbers [11, 83, 183, 631, 1037]. I thought that the sum of each set of 11, 83, 183, 631, and 1037 consecutive prime numbers to be simultaneously equal and a prime number itself would be the reason why it was taking so many prime numbers...

In the second half, after a deserved time sleeping, I decided to "attack" the second "small" puzzle when I immediately noticed a big mess.

Forth problem

When I decided to follow the ancient algorithm of the Sieve of Eratosthenes to pre-compute all the prime numbers up to a specified integer, I could see that second "small" puzzle was "lagging" the whole thing and not the first...

According to the reasoning I used in the algorithm to find the smallest sum of prime numbers, we must keep an eye on the current "smallest sum possible". This is to say that, if we start with two sets (call it "strings") of 3 and 7 consecutive prime numbers the initial smallest sum possible is defined by the first 7 primes (equals 41) because it is impossible for the sum of 7 consecutive prime numbers to be less than the first 7 primes.

Trying to put it simply:
The algorithm tries to shift every string of consecutive prime numbers (a string of length 3, and a string a length 7) by one prime number to the right if the sum of that string is smaller than the current "smallest sum possible". If a string is found whose sum is greater than the current "smallest sum possible", then it means the current "smallest sum possible" kind of "overflowed" and it is not possible to find two strings of length 3 and 7 whose sums are equal and up to the current "smallest sum possible". When this "overflow" happens, the current "smallest sum possible" must be updated to the minimum of the sums of the current strings.

Code for the solution

The following solution is written in C++ and sacrifices a lot of memory. As you will see, it even stores whether odd numbers, zero and one are prime or not which is not needed, as I didn't care much about it... Another problem with the program is that it does not ensure the defined MAX_NUMBER of prime numbers is enough to solve the problem, nor it tests that condition. If it is not enough, you will get a beautiful "segmentation fault", "bus error" or other crash =) Just set the value high enough paying attention to memory limits. The default value of 10000000 should be more than enough for all or most problems for the 4th puzzle.
Nevertheless, the program computes the solution for [11, 83, 183, 631, 1037] and others in about 4 seconds.

Update:
I've just updated my solution removing support for biging library as it is not required at all! My solution now takes a total of 0.233 seconds to print the result (0.221172 seconds for the prime generation up to 8000000 plus 0.00250829 seconds for the second "small" puzzle):

$ time ./sumprimes
Precalculating...
348513 primes pre-calculated
Took 0.224782 seconds
Result: *******
Took 0.00254872 seconds

real 0m0.233s


/**
* Google Treasure Hunt 2008 - Question 4
* by Mário Freitas (imkira at gmail dot com)
*/

// standard libraries
#include
#include

#import

// support for big integers
#include "BigInteger.hh"
#include "BigIntegerUtils.hh"

// please change as desired
#define MAX_NUMBER 8000000
#define NUM_NTH_PRIMES 5000000

// number of strings
#define NUM_STRINGS 5

typedef unsigned long long Counter;

// sarifice space for prime numbers...
bool gIsPrime[MAX_NUMBER+1];
Counter gMaxNumber(MAX_NUMBER);
Counter gMaxNthPrime(NUM_NTH_PRIMES);
Counter gNthPrimes[NUM_NTH_PRIMES+1];
Counter gSumIndices[NUM_STRINGS];
Counter gSumLengths[NUM_STRINGS];
Counter gSumTotals[NUM_STRINGS];

void buildPrimesErastothenes ()
{
Counter lNumber = 2, j, lTwo = 2;

// initialize array to true
memset(&gIsPrime, 1, sizeof(gIsPrime));
gIsPrime[0] = gIsPrime[1] = false;

// do not care about even numbers, zero or one...
// just calculate if an integer is prime or not!
while ((lNumber * lNumber) <= gMaxNumber)
{
j = lNumber * lTwo;

while (j <= gMaxNumber)
{
gIsPrime[j] = false;
j += lNumber;
}

++lNumber;

while ((lNumber <= gMaxNumber) && (!gIsPrime[lNumber]))
{
++lNumber;
}
}

j = 0;

// calculate an array which contains the Nth prime numbers
for (Counter i = 0; i <= gMaxNthPrime; ++i)
{
if (gIsPrime[i])
{
gNthPrimes[j] = i;
++j;
}
}

std::cout << j << " primes pre-calculated\n";
}

Counter sumConsecutivePrimes (Counter aNth, Counter aLength)
{
Counter lSum = 0, i = aNth;

aLength = i + aLength;

while (i < aLength)
{
lSum += gNthPrimes[i];
++i;
}

return lSum;
}

int main (int argc, char *argv[])
{
Counter lOne = 1, lMinimumSum, lNewMinimumSum;
int i = 0, j = 0;
uint64_t t0, t1, t;
mach_timebase_info_data_t info;
double nano, seconds;

// precalculate prime numbers up to MAX_NUMBER
printf("Precalculating...\n");
t0 = mach_absolute_time();
buildPrimesErastothenes();
t1 = mach_absolute_time();
t = t1 - t0;

mach_timebase_info(&info);
nano = 1e-9 * ( (double) info.numer) / ((double) info.denom);
seconds = (double)t * nano;

std::cout << "Took " << seconds << " seconds" << std::endl;

// initialize the size of each string
gSumLengths[0] = 11;
gSumLengths[1] = 83;
gSumLengths[2] = 183;
gSumLengths[3] = 631;
gSumLengths[4] = 1037;

t0 = mach_absolute_time();
// initialize the sums and starting offsets of all strings of consecutive primes
for (i = 0; i < NUM_STRINGS; ++i)
{
gSumIndices[i] = 0;
gSumTotals[i] = sumConsecutivePrimes(gSumIndices[i], gSumLengths[i]);
}

// calculate the initial "smallest sum possible"
lMinimumSum = gSumTotals[NUM_STRINGS-1];

// while the sums are not simultaneously equal and a prime number...
while ((j != 0) || (!gIsPrime[gSumTotals[NUM_STRINGS-1]]))
{
i = j = 0;
for (i = 0; i < NUM_STRINGS; ++i)
{
// keep the "smallest sum possible"
lNewMinimumSum = lMinimumSum;

// the sum of the string is too small?
if (gSumTotals[i] < lMinimumSum)
{
// remove the leftmost element of the sum
gSumTotals[i] -= gNthPrimes[gSumIndices[i]];
// shift the string to the right
++gSumIndices[i];
// add the new rightmost element to the sum
gSumTotals[i] += gNthPrimes[gSumIndices[i] + gSumLengths[i] - lOne];
// a mismatch was found...
++j;
}
else
// the sum of the string overflowed?
if (gSumTotals[i] > lMinimumSum)
{
// find the minimum "smallest sum possible" of the current strings
if (lNewMinimumSum == lMinimumSum)
{
lNewMinimumSum = gSumTotals[i];
}
else
if (gSumTotals[i] < lNewMinimumSum)
{
lNewMinimumSum = gSumTotals[i];
}

// a mismatch was found...
++j;
}

// update the "smallest sum possible"...
lMinimumSum = lNewMinimumSum;
}
}
t1 = mach_absolute_time();
t = t1 - t0;

// print the result!
std::cout << "Result: " << gSumTotals[NUM_STRINGS-1] << std::endl;

mach_timebase_info(&info);
nano = 1e-9 * ( (double) info.numer) / ((double) info.denom);
seconds = (double)t * nano;

std::cout << "Took " << seconds << " seconds" << std::endl;

exit(0);
}

4 comments:

Turz said...

12 lines of MATLAB code are enough.

imkira said...

Oh really :)
I should have learned MATLAB. Could you post your solution in MATLAB?

Thanks turz

Schwolop said...

load primes; % Get first 1e6 primes from file.
numPrimes = size(primes,1)*size(primes,2);
searches = [3,7,229,1365];
numSearches = length(searches);
sums = zeros(numSearches,numPrimes); % Space to store sums.

% Determine n-sums of all the primes
for i = 1:numSearches
n = searches(i);
tempSum = sum(primes(1:n)); % Get sum of first n primes;
sums(i,1) = tempSum;
for j = 2:(numPrimes - searches(i))
sums(i,j) = sums(i,j-1) - primes(j-1) + primes(j+n-1);
end
end

% Search through the n-sums for each n, to find all those sums which exist
% in each set.
biggestSums = sums(numSearches,1:numPrimes-searches(numSearches));
% (The least values will be in the set with biggest n, so iterate
% through that one, and get only non-zero values for this set.)
for i = 1:length(biggestSums)
for( j = 1:numSearches-1 )
a = find( sums(j,:) == biggestSums(i) );
if( isempty(a) ) % If this sum was not found in any set, move on.
continue;
end
end
if( ~isempty(a) ) % If after going through each set, this value is in
% all of them, then test if it is actually a prime.
a = find( primes == biggestSums(i) );
if( ~isempty(a) ) % If it is a prime, then stop because we're done!
string = '';
for( k = 1:numSearches-1 )
string = strcat( string, sprintf(' %d,',searches(k)) );
end
string = strcat( string, sprintf(' and %d', searches(numSearches)) );
disp(['The smallest prime which is a sum of',string,' consecutive primes is: ',num2str(biggestSums(i))]);
break;
end
end
end

imkira said...

Thanks schwolop!

That is not exactly "12 lines of MATLAB code" but pretty good anyway :)

How much time does it take to compute all that for [11, 83, 183, 631, 1037] ?

Best regards