chasing panaitopol primes

August 16, 2025

algorithmsmathematicsoptimizationgoproject-eulerpython

Chasing Panaitopol Primes

From Brute Force Pain to Elegant Math

contents

I've been working through Project Euler problems, and Problem 291 ("Panaitopol primes") sent me down a fascinating rabbit hole. What started as a naïve double loop taking hours transformed into a 5-second solution through mathematical insight, careful optimization, and some surprising lessons about parallelization. Here's the complete journey.

The Problem

A Panaitopol prime is a prime number that can be expressed as:

p = (a4 - b4) / (a3 + b3)

where a > b > 0 are positive integers.

The challenge: Find how many Panaitopol primes exist below 5 × 1015. Yeah, that is 5 followed by 15 zeroes. 5_000_000_000_000_000. To put that in perspective, if light had to travel this number of kilometers, it would take 528.5 years. That's almost-almost the amount of time it takes for light from Betelgeuse to reach Earth.

The Naïve Beginning - Two Loops, Too Much Pain

Looking at the formula for Panaitopol primes:

p = (a4 - b4) / (a3 + b3)

My first instinct was the obvious one - just try all possible (a, b) pairs. But here's the catch: not all (a, b) pairs produce integer results! The division needs to be exact. So I implemented it with remainder checking:

def panaitopol(x, y):
    x3 = x ** 3
    x4 = x3 * x
    y3 = y ** 3
    y4 = y3 * y
    return (x4 - y4) // (x3 + y3), (x4 - y4) % (x3 + y3)

l = set()
for i in range(1, 5001):
    for j in range(1, i):
        pp, remainder = panaitopol(i, j)
        if remainder == 0 and isprime(pp):
            l.add(pp)

Even with bounds of just 5,000, this was painfully slow. That's $O(n^2)$ candidate pairs, and for the actual problem limit, this approach would take literally forever. Plus, I had to check divisibility for each pair - double the pain!

First Algebraic Simplification (Still Stuck with Two Variables)

I tried simplifying the expression algebraically. Let's start with the original formula and factor both the numerator and denominator:

p = (a4 - b4) / (a3 + b3)

For the numerator, we can use the difference of squares twice:

a4 - b4 = (a2)2 - (b2)2 = (a2 - b2)(a2 + b2)

And then factor a2 - b2 further:

a4 - b4 = (a - b)(a + b)(a2 + b2)

For the denominator, we use the sum of cubes formula:

a3 + b3 = (a + b)(a2 - ab + b2)

So our fraction becomes:

p = ((a - b)(a + b)(a2 + b2)) / ((a + b)(a2 - ab + b2))

We can cancel the common factor (a + b):

p = ((a - b)(a2 + b2)) / (a2 - ab + b2)

This was cleaner, but computationally? Still O(n²). I still needed to check every (a, b) pair, and the formula still had two independent variables. The key insight I was missing: I needed to find a relationship between a and b that would guarantee integer results and reduce the problem to one variable.

The Breakthrough - Variable Substitution

This was the key insight! What if instead of treating a and b as independent variables, I could couple them? I tried substituting:

  • a = n + k
  • b = n

This transforms our simplified formula into:

p = ((n + k - n)((n + k)2 + n2)) / ((n + k)2 - n(n + k) + n2)

Simplifying the numerator:

numerator = k × ((n + k)2 + n2)

And the denominator becomes:

denominator = (n + k)2 - n(n + k) + n2

Expanding the denominator:

= n2 + 2nk + k2 - n2 - nk + n2
= n2 + nk + k2

Now, for this to always produce integers, I needed the denominator to divide the numerator cleanly. Through experimentation (and some mathematical analysis), I discovered that when k = 1, something magical happens.

Let's verify with k = 1:

numerator = 1 × ((n + 1)2 + n2) = 2n2 + 2n + 1
denominator = n2 + n + 1

And miraculously, it works! for a = n+1 and b = n, the original formula simplifies directly to:

p = n2 + (n+1)2 = 2n2 + 2n + 1

This changed everything! From O(n²) to O(n) - a massive algorithmic improvement. Instead of checking all (a, b) pairs, I only needed to iterate through values of n.

def panaitopol_simplified(n):
    return n * n + (n + 1) * (n + 1)
    # Or equivalently: 2 * n * (n + 1) + 1

Bounding the Search Space

With the simplified formula, I could now bound the search space properly:

2n2 + 2n + 1 < 5 × 1015

Solving for n:

n < √(5 × 1015 / 2) ≈ 5 × 107

Only 50 million values to check instead of exploring a 2D space!

The Implementation Evolution

Python + SymPy (6 minutes)

from sympy import isprime
from math import isqrt
from tqdm import tqdm

LIMIT = 5 * (10**15)
count = 0
for n in tqdm(range(1, isqrt(LIMIT) + 1)):
    if isprime(2 * n * (n + 1) + 1):
        count += 1

First Go Implementation (43 seconds)

Using Go's big.Int.ProbablyPrime:

for n := uint64(1); n <= maxN; n++ {
    p := 2*n*(n+1) + 1
    b.SetUint64(p)
    if b.ProbablyPrime(1) {
        count++
    }
}

Optimized Go with Custom Miller-Rabin (5 seconds!)

The game-changer was implementing a specialized 64-bit Miller-Rabin test:

func deterministicMillerRabin64(n uint64) bool {
    // Deterministic for all n < 2^64 using bases {2, 3, 5, 7, 11}
    // ... specialized implementation
}

// With bit manipulation optimization
p := ((n * (n + 1)) << 1) + 1
if deterministicMillerRabin64(p) {
    count++
}

The Parallelization Experiments

Lessons from Over-Concurrentization

I experimented with varying the number of goroutines: (Here's the graph of the results of the experiment - the x axis is the number of goroutines, and the y axis is the time in seconds to run the program)

Key findings:

  • Sweet spot: 8 goroutines (matching my CPU's logical cores)
  • Diminishing returns: Beyond 8, context switching overhead dominates
  • Over-concurrentization hurts: Too many goroutines actually degraded performance

Failed Distributed Computing Attempts

I also tried:

  • Python multiprocessing: Helped, but Python overhead was still significant
  • PySpark: Framework overhead completely dominated the actual computation
  • Spark in Google Colab: Even worse due to JVM startup costs

Lesson: Don't reach for distributed computing until you've exhausted single-machine optimizations.

Micro-Optimizations That Mattered

Bit Shifting vs Multiplication

// Before
p := 2 * n * (n + 1) + 1

// After - ~1.67% improvement
p := ((n * (n + 1)) << 1) + 1

Over 50 million iterations, even 1-2% adds up!

Trial Division Pre-filtering

// Quick elimination of ~90% of composites
if !passesTrial(p) {
    continue
}

The LRU Cache Trap

I tried adding @lru_cache to the Python version:

@lru_cache(maxsize=None)
def panaitopol(x, y):
    # ... computation

Result: Performance got WORSE! Why? With 25 million unique calls and zero cache hits, I was just adding overhead. Lesson: caching only helps when you have repeated computations.

The Final Performance Journey

ApproachTimeSpeedupNotes
Python naive O(n²)Hours1xDouble nested loop
Python with safe divisionStill hours~1xFixed correctness, not speed
Python simplified O(n)6 min~100xMathematical breakthrough!
Go with big.Int43 sec~8xLanguage change
Go with custom Miller-Rabin5 sec~8.6xAlgorithm optimization
Total improvement~7,200xMath + Engineering

Key Lessons Learned

  1. Mathematical insight beats raw compute power: The O(n²) to O(n) reduction was worth more than all parallelization combined.

  2. Premature parallelization is a root of evil: I tried Spark, considered Hadoop, experimented with channels - but the real wins came from algorithm and implementation improvements.

  3. Know your tools deeply:

    • Python's sympy.isprime: Convenient but slow
    • Go's big.Int.ProbablyPrime: Has overhead for small numbers
    • Custom Miller-Rabin for 64-bit: Blazing fast
  4. Profile everything: My goroutine experiments revealed the exact saturation point of my hardware.

  5. Small optimizations matter at scale: Even changing 2*n to n<<1 gave measurable improvements over 50 million iterations.

  6. Caching isn't always the answer: My LRU cache experiment showed that blindly adding caching can hurt performance.

The Complete Optimized Solution

Here's the final algorithm in pseudocode that runs in under 5 seconds:

Algorithm: Count Panaitopol Primes
Input: LIMIT = 5 × 10^15

1. Calculate upper bound for n:
   max_n = floor(sqrt(LIMIT / 2))

2. Initialize count = 0

3. Split range [1, max_n] into chunks for parallel processing

4. For each chunk in parallel:
   For n from chunk_start to chunk_end:
     a. Calculate p = 2n(n+1) + 1
        (using bit shift: p = ((n × (n+1)) << 1) + 1)
     
     b. If p >= LIMIT, skip
     
     c. Quick trial division by small primes (2, 3, 5, ..., 61)
        If divisible by any, skip (unless p equals that prime)
     
     d. Apply deterministic Miller-Rabin test with bases 
        {2, 3, 5, 7, 11} => (deterministic for all n < 2^64)
     
     e. If prime, increment local count

5. Sum all local counts from parallel workers

6. Return total count

The key optimizations:

  • Mathematical reduction: From O(n²) double loop to O(n) single loop
  • Bit manipulation: Replace multiplication by 2 with left shift
  • Trial division: Eliminate ~90% of composites cheaply
  • Deterministic primality: No probabilistic false positives
  • Parallelization: Use all CPU cores with balanced chunks

Final Thoughts

What started as a brute force two-dimensional search became a clean, optimized solution through:

  1. Mathematical simplification (the biggest win)
  2. Algorithmic optimization (custom primality testing)
  3. Language selection (Go's performance)
  4. Careful parallelization (just enough, not too much)
  5. Micro-optimizations (bit operations, trial division)

The journey from hours to 5 seconds taught me that before throwing compute power at a problem, always ask: "Can this be expressed more elegantly?" Often, the answer transforms the impossible into the trivial.

Final answer: ?????? Panaitopol primes exist below 5 × 1015. I hope you have fun figuring out the answer yourself! 😄


Want to try this yourself? The complete optimized Go implementation is available above. For the mathematically curious, I recommend starting with the algebraic derivation - it's a beautiful piece of number theory.

Bonus: What's Next?

If you enjoyed this optimization journey, you might also like:

  • Exploring SIMD instructions for even faster primality testing
  • Investigating why certain (a, b) pairs produce primes more often
  • Generalizing to other forms of specialized primes
  • Applying these optimization patterns to other Project Euler problems

What Project Euler problems have sent you down similar rabbit holes? I'd love to hear about your optimization journeys!