Chasing Panaitopol Primes
From Brute Force Pain to Elegant Math
contents
- the problem
- the naïve beginning - two loops, too much pain
- first algebraic simplification (still stuck with two variables)
- the breakthrough - variable substitution
- bounding the search space
- the implementation evolution
- the parallelization experiments
- micro-optimizations that mattered
- the final performance journey
- key lessons learned
- the complete optimized solution
- final thoughts
- bonus: what's next?
I've been working through Project Euler problems, and Problem 291 ("Panaitopol primes") sent me down a fascinating rabbit hole. What started as a naïve double loop taking hours transformed into a 5-second solution through mathematical insight, careful optimization, and some surprising lessons about parallelization. Here's the complete journey.
The Problem
A Panaitopol prime is a prime number that can be expressed as:
p = (a4 - b4) / (a3 + b3)
where a > b > 0 are positive integers.
The challenge: Find how many Panaitopol primes exist below 5 × 1015. Yeah, that is 5 followed by 15 zeroes. 5_000_000_000_000_000
. To put that in perspective, if light had to travel this number of kilometers, it would take 528.5 years. That's almost-almost the amount of time it takes for light from Betelgeuse to reach Earth.
The Naïve Beginning - Two Loops, Too Much Pain
Looking at the formula for Panaitopol primes:
p = (a4 - b4) / (a3 + b3)
My first instinct was the obvious one - just try all possible (a, b) pairs. But here's the catch: not all (a, b) pairs produce integer results! The division needs to be exact. So I implemented it with remainder checking:
def panaitopol(x, y):
x3 = x ** 3
x4 = x3 * x
y3 = y ** 3
y4 = y3 * y
return (x4 - y4) // (x3 + y3), (x4 - y4) % (x3 + y3)
l = set()
for i in range(1, 5001):
for j in range(1, i):
pp, remainder = panaitopol(i, j)
if remainder == 0 and isprime(pp):
l.add(pp)
Even with bounds of just 5,000, this was painfully slow. That's $O(n^2)$ candidate pairs, and for the actual problem limit, this approach would take literally forever. Plus, I had to check divisibility for each pair - double the pain!
First Algebraic Simplification (Still Stuck with Two Variables)
I tried simplifying the expression algebraically. Let's start with the original formula and factor both the numerator and denominator:
p = (a4 - b4) / (a3 + b3)
For the numerator, we can use the difference of squares twice:
a4 - b4 = (a2)2 - (b2)2 = (a2 - b2)(a2 + b2)
And then factor a2 - b2 further:
a4 - b4 = (a - b)(a + b)(a2 + b2)
For the denominator, we use the sum of cubes formula:
a3 + b3 = (a + b)(a2 - ab + b2)
So our fraction becomes:
p = ((a - b)(a + b)(a2 + b2)) / ((a + b)(a2 - ab + b2))
We can cancel the common factor (a + b):
p = ((a - b)(a2 + b2)) / (a2 - ab + b2)
This was cleaner, but computationally? Still O(n²). I still needed to check every (a, b) pair, and the formula still had two independent variables. The key insight I was missing: I needed to find a relationship between a and b that would guarantee integer results and reduce the problem to one variable.
The Breakthrough - Variable Substitution
This was the key insight! What if instead of treating a and b as independent variables, I could couple them? I tried substituting:
- a = n + k
- b = n
This transforms our simplified formula into:
p = ((n + k - n)((n + k)2 + n2)) / ((n + k)2 - n(n + k) + n2)
Simplifying the numerator:
numerator = k × ((n + k)2 + n2)
And the denominator becomes:
denominator = (n + k)2 - n(n + k) + n2
Expanding the denominator:
= n2 + 2nk + k2 - n2 - nk + n2
= n2 + nk + k2
Now, for this to always produce integers, I needed the denominator to divide the numerator cleanly. Through experimentation (and some mathematical analysis), I discovered that when k = 1, something magical happens.
Let's verify with k = 1:
numerator = 1 × ((n + 1)2 + n2) = 2n2 + 2n + 1
denominator = n2 + n + 1
And miraculously, it works! for a = n+1 and b = n, the original formula simplifies directly to:
p = n2 + (n+1)2 = 2n2 + 2n + 1
This changed everything! From O(n²) to O(n) - a massive algorithmic improvement. Instead of checking all (a, b) pairs, I only needed to iterate through values of n.
def panaitopol_simplified(n):
return n * n + (n + 1) * (n + 1)
# Or equivalently: 2 * n * (n + 1) + 1
Bounding the Search Space
With the simplified formula, I could now bound the search space properly:
2n2 + 2n + 1 < 5 × 1015
Solving for n:
n < √(5 × 1015 / 2) ≈ 5 × 107
Only 50 million values to check instead of exploring a 2D space!
The Implementation Evolution
Python + SymPy (6 minutes)
from sympy import isprime
from math import isqrt
from tqdm import tqdm
LIMIT = 5 * (10**15)
count = 0
for n in tqdm(range(1, isqrt(LIMIT) + 1)):
if isprime(2 * n * (n + 1) + 1):
count += 1
First Go Implementation (43 seconds)
Using Go's big.Int.ProbablyPrime
:
for n := uint64(1); n <= maxN; n++ {
p := 2*n*(n+1) + 1
b.SetUint64(p)
if b.ProbablyPrime(1) {
count++
}
}
Optimized Go with Custom Miller-Rabin (5 seconds!)
The game-changer was implementing a specialized 64-bit Miller-Rabin test:
func deterministicMillerRabin64(n uint64) bool {
// Deterministic for all n < 2^64 using bases {2, 3, 5, 7, 11}
// ... specialized implementation
}
// With bit manipulation optimization
p := ((n * (n + 1)) << 1) + 1
if deterministicMillerRabin64(p) {
count++
}
The Parallelization Experiments
Lessons from Over-Concurrentization
I experimented with varying the number of goroutines: (Here's the graph of the results of the experiment - the x axis is the number of goroutines, and the y axis is the time in seconds to run the program)
Key findings:
- Sweet spot: 8 goroutines (matching my CPU's logical cores)
- Diminishing returns: Beyond 8, context switching overhead dominates
- Over-concurrentization hurts: Too many goroutines actually degraded performance
Failed Distributed Computing Attempts
I also tried:
- Python multiprocessing: Helped, but Python overhead was still significant
- PySpark: Framework overhead completely dominated the actual computation
- Spark in Google Colab: Even worse due to JVM startup costs
Lesson: Don't reach for distributed computing until you've exhausted single-machine optimizations.
Micro-Optimizations That Mattered
Bit Shifting vs Multiplication
// Before
p := 2 * n * (n + 1) + 1
// After - ~1.67% improvement
p := ((n * (n + 1)) << 1) + 1
Over 50 million iterations, even 1-2% adds up!
Trial Division Pre-filtering
// Quick elimination of ~90% of composites
if !passesTrial(p) {
continue
}
The LRU Cache Trap
I tried adding @lru_cache
to the Python version:
@lru_cache(maxsize=None)
def panaitopol(x, y):
# ... computation
Result: Performance got WORSE! Why? With 25 million unique calls and zero cache hits, I was just adding overhead. Lesson: caching only helps when you have repeated computations.
The Final Performance Journey
Approach | Time | Speedup | Notes |
---|---|---|---|
Python naive O(n²) | Hours | 1x | Double nested loop |
Python with safe division | Still hours | ~1x | Fixed correctness, not speed |
Python simplified O(n) | 6 min | ~100x | Mathematical breakthrough! |
Go with big.Int | 43 sec | ~8x | Language change |
Go with custom Miller-Rabin | 5 sec | ~8.6x | Algorithm optimization |
Total improvement | ~7,200x | Math + Engineering |
Key Lessons Learned
-
Mathematical insight beats raw compute power: The O(n²) to O(n) reduction was worth more than all parallelization combined.
-
Premature parallelization is a root of evil: I tried Spark, considered Hadoop, experimented with channels - but the real wins came from algorithm and implementation improvements.
-
Know your tools deeply:
- Python's
sympy.isprime
: Convenient but slow - Go's
big.Int.ProbablyPrime
: Has overhead for small numbers - Custom Miller-Rabin for 64-bit: Blazing fast
- Python's
-
Profile everything: My goroutine experiments revealed the exact saturation point of my hardware.
-
Small optimizations matter at scale: Even changing
2*n
ton<<1
gave measurable improvements over 50 million iterations. -
Caching isn't always the answer: My LRU cache experiment showed that blindly adding caching can hurt performance.
The Complete Optimized Solution
Here's the final algorithm in pseudocode that runs in under 5 seconds:
Algorithm: Count Panaitopol Primes
Input: LIMIT = 5 × 10^15
1. Calculate upper bound for n:
max_n = floor(sqrt(LIMIT / 2))
2. Initialize count = 0
3. Split range [1, max_n] into chunks for parallel processing
4. For each chunk in parallel:
For n from chunk_start to chunk_end:
a. Calculate p = 2n(n+1) + 1
(using bit shift: p = ((n × (n+1)) << 1) + 1)
b. If p >= LIMIT, skip
c. Quick trial division by small primes (2, 3, 5, ..., 61)
If divisible by any, skip (unless p equals that prime)
d. Apply deterministic Miller-Rabin test with bases
{2, 3, 5, 7, 11} => (deterministic for all n < 2^64)
e. If prime, increment local count
5. Sum all local counts from parallel workers
6. Return total count
The key optimizations:
- Mathematical reduction: From O(n²) double loop to O(n) single loop
- Bit manipulation: Replace multiplication by 2 with left shift
- Trial division: Eliminate ~90% of composites cheaply
- Deterministic primality: No probabilistic false positives
- Parallelization: Use all CPU cores with balanced chunks
Final Thoughts
What started as a brute force two-dimensional search became a clean, optimized solution through:
- Mathematical simplification (the biggest win)
- Algorithmic optimization (custom primality testing)
- Language selection (Go's performance)
- Careful parallelization (just enough, not too much)
- Micro-optimizations (bit operations, trial division)
The journey from hours to 5 seconds taught me that before throwing compute power at a problem, always ask: "Can this be expressed more elegantly?" Often, the answer transforms the impossible into the trivial.
Final answer: ?????? Panaitopol primes exist below 5 × 1015. I hope you have fun figuring out the answer yourself! 😄
Want to try this yourself? The complete optimized Go implementation is available above. For the mathematically curious, I recommend starting with the algebraic derivation - it's a beautiful piece of number theory.
Bonus: What's Next?
If you enjoyed this optimization journey, you might also like:
- Exploring SIMD instructions for even faster primality testing
- Investigating why certain (a, b) pairs produce primes more often
- Generalizing to other forms of specialized primes
- Applying these optimization patterns to other Project Euler problems
What Project Euler problems have sent you down similar rabbit holes? I'd love to hear about your optimization journeys!