Tying Random Ends (Adventures in Optimization)

Simulating, curve-fitting, optimizing, and ultimately solving a probability puzzle about tying random string ends together.

In an effort to stave off the absolute existential dread that accompanies me with both a) doing my Master's in AI and b) being a software engineer in 2026, I find it soothing to go back to my roots and solve interesting problems for the hell of it.

This 3blue1brown YouTube short recently popped into my feed:

In short, the problem is this:

Suppose you drop nn random strings of yarn into a box. Then you randomly pick two ends of the strings and connect them. Sometimes, you will make a loop of string. Sometimes you will not. You repeat this process until there are no more ends left. For 50 strings, what is the expected number of loops that you will end up with?

It's short problems like these that I really enjoy solving. I like to call these kinds of questions "intellectual junk food" - small, tight, neat little problems that are made to be solved, and usually have a small "eureka moment" to give you that hit of dopamine your brain desperately craves. (Though I guess a more common term would be "puzzles" now that I think about it...) These questions give you a nice break from the messy real world, where solutions aren't so clear-cut.

When solving these types of problems, especially when dealing with probability, I like to do a computer simulation at first. However, something clicked this time when I watched this video. I realized that I could also come up with various functions, give them parameters, and fit the data to it using PyTorch. Cue all the actuaries giving me a disapproving look for just now realizing this, but hey, we all get there eventually, right?

Creating the Simulation

First up was the simulation logic. This is pretty self-explanatory. Since each string is fully defined by its ends, that's the only thing I have to keep track of. I number the ends from 0 up to 2n-1, put them in a list, and keep track of each end's corresponding other end on the string.

Now, note that when you connect two ends, they get removed from consideration. They either form a loop, or they get attached to another string. Either way, they aren't "ends" anymore.

Thus, we can remove the ends from the list. If the ends already correspond to each other, then we have a loop! Otherwise, we made a string and need to make the ends of this new string correspond to each other.

def perform(n_strings: int) -> int:
  all_ends: list[int] = list(range(2*n_strings))
  end_map: dict[int, int] = {}
  for a, b in batched(all_ends, 2):
    end_map[a] = b
    end_map[b] = a

  result = 0

  while len(all_ends) != 0:
    # Remove a random end from the list
    a = all_ends.pop(random.randrange(len(all_ends)))
    b = all_ends.pop(random.randrange(len(all_ends)))

    # Find the other corresponding end
    a_corresp = end_map[a]

    # If they belong to the same string, then increment result
    if a_corresp == b:
      result += 1
      continue

    b_corresp = end_map[b] # else, merge

    end_map[a_corresp] = b_corresp
    end_map[b_corresp] = a_corresp

  return result

I then sample this for a bunch of strings and trials for each string length.

def run_trials_python(
  max_n_strings: int,
  n_trials: int,
) -> pd.DataFrame:
  rows = []
  for n_strings in tqdm(range(1, max_n_strings + 1), desc="Gathering"):
    for _ in range(n_trials):
      rows.append({'x': n_strings, 'y': perform(n_strings)})

  return pd.DataFrame(rows)

Next up is to code some PyTorch models with tunable parameters. I have a feeling that this is going to increase sharply at first, and then slowly as the number of strings increase. I come up with 4 functions: one based on the harmonic series, a logarithm, a power function, and a polynomial function.

class HarmonicModel(torch.nn.Module):
  def __init__(self):
    super().__init__()
    self.a = torch.nn.Parameter(torch.tensor(1.0))
    self.b = torch.nn.Parameter(torch.tensor(0.0))
    self._H_cache: torch.Tensor | None = None  # Expensive!

  def forward(self, x: torch.Tensor) -> torch.Tensor:
    if self._H_cache is None or self._H_cache.shape != x.shape:
      self._H_cache = torch.tensor([
        sum(1.0/k for k in range(1, int(n)+1)) for n in x.numpy()
      ])
    return self.a * self._H_cache + self.b

  def pretty(self) -> str:
    return f"{self.a.item():.4f} * H(n) + {self.b.item():.4f}"

class LogModel(torch.nn.Module):
  def __init__(self):
    super().__init__()
    self.a = torch.nn.Parameter(torch.tensor(1.0))
    self.b = torch.nn.Parameter(torch.tensor(0.0))

  def forward(self, x: torch.Tensor) -> torch.Tensor:
    return self.a * torch.log(x) + self.b

  def pretty(self) -> str:
    return f"{self.a.item():.4f} * ln(n) + {self.b.item():.4f}"

class PowerModel(torch.nn.Module):
  def __init__(self):
    super().__init__()
    self.a = torch.nn.Parameter(torch.tensor(1.0))
    self.b = torch.nn.Parameter(torch.tensor(0.5))

  def forward(self, x: torch.Tensor) -> torch.Tensor:
    return self.a * x ** self.b

  def pretty(self) -> str:
    return f"{self.a.item():.4f} * n^{self.b.item():.4f}"

class PolynomialModel(torch.nn.Module):
  def __init__(self):
    super().__init__()
    self.a = torch.nn.Parameter(torch.tensor(0.01))
    self.b = torch.nn.Parameter(torch.tensor(0.1))
    self.c = torch.nn.Parameter(torch.tensor(0.0))

  def forward(self, x: torch.Tensor) -> torch.Tensor:
    return self.a * x**2 + self.b * x + self.c

  def pretty(self) -> str:
    return f"{self.a.item():.4f}n² + {self.b.item():.4f}n + {self.c.item():.4f}"

Now, fundamentally what we are doing is taking a large number of samples from model defined in perform, all independent and identically distributed. Because of the central limit theorem, we know that the sample mean becomes a good approximation of the true mean of the underlying distribution. In this case, that's the expected number of created loops which is exactly what we are trying to find!

Thus, I can fit the hypotheses using Adam and MSELoss, using the mean as the data. I also log the 95% confidence interval and r-squared for good measure.

def fit_hypotheses(
    trials_df: pd.DataFrame,
    models: dict[str, torch.nn.Module],
    lr: float = 0.01,
    epochs: int = 5000,
) -> tuple[pd.DataFrame, pd.DataFrame]:
  plot_df = trials_df.groupby('x')['y'].agg(
    mean='mean', std='std', count='count').reset_index()
  plot_df['ci95'] = 1.96 * plot_df['std'] / np.sqrt(plot_df['count'])

  x = torch.tensor(plot_df['x'].values, dtype=torch.float32)
  y = torch.tensor(plot_df['mean'].values, dtype=torch.float32)

  meta_rows = []

  for name, model in tqdm(models.items(), desc="Fitting", total=len(models)):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    loss_fn = torch.nn.MSELoss()

    for _ in range(epochs):
      optimizer.zero_grad()
      loss = loss_fn(model(x), y)
      loss.backward()
      optimizer.step()

    with torch.no_grad():
      fitted = model(x)
      ss_res = torch.sum((fitted - y) ** 2).item()
      ss_tot = torch.sum((y - y.mean()) ** 2).item()
      r2 = 1 - ss_res / ss_tot
      plot_df[name] = fitted.numpy()

    pretty = model.pretty()
    print(f"  {name}: {pretty} (R²={r2:.4f})")
    meta_rows.append({'name': name, 'pretty': pretty, 'r2': r2})

  meta_df = pd.DataFrame(meta_rows)
  return plot_df, meta_df

Then plot:

def plot(
    plot_df: pd.DataFrame,
    meta_df: pd.DataFrame,
    filename: str,
) -> None:
  plt.figure(figsize=(10, 6))
  plt.plot(plot_df['x'], plot_df['mean'], label='mean', linewidth=2)
  plt.plot(plot_df['x'], plot_df['mean'] + plot_df['ci95'],
           linestyle='--', color='gray', alpha=0.5, label='95% CI')
  plt.plot(plot_df['x'], plot_df['mean'] - plot_df['ci95'],
           linestyle='--', color='gray', alpha=0.5)

  for _, row in meta_df.iterrows():
    label = f"{row['name']}: {row['pretty']} (R²={row['r2']:.4f})"
    plt.plot(plot_df['x'], plot_df[row['name']], label=label)

  plt.xlim(left=0)
  plt.ylim(bottom=0)
  plt.xlabel('n (number of strings)')
  plt.ylabel('E(n) (expected loops)')
  plt.legend()
  plt.tight_layout()
  plt.savefig(filename, dpi=150)
  plt.close()

And the main function:

def main() -> None:
  max_n_strings = 1000

  print(f"--- Phase 1: Gather data {max_n_strings=} ---")
  start_time = time.time()
  trials_df = run_trials_python(max_n_strings, 100)
  print(f"  Took {time.time() - start_time:.2f} seconds.")

  print("--- Phase 2: Fit hypotheses ---")
  start_time = time.time()
  models: dict[str, torch.nn.Module] = {
    'harmonic': HarmonicModel(),
    'log': LogModel(),
    'power': PowerModel(),
    'polynomial': PolynomialModel(),
  }
  plot_df, meta_df = fit_hypotheses(trials_df, models)
  print(f"  Took {time.time() - start_time:.2f} seconds.")

  print("--- Phase 3: Plot ---")
  plot(plot_df, meta_df, 'random-ends-v1.png')
  print("  Saved to random-ends-v1.png")

And, in less than a second, out popped an image!

100 string lengths, 100 trials each.

Good data! However, I'd like to have more trials per string length to tighten that CI. Additionally, I'd like to extend the graph out further to the right. Let's go to run_trials_python(1000, 1000) and see.

I ran the program and...

--- Phase 1: Gather data max_n_strings=1000 ---
  Took 484.36 seconds.
--- Phase 2: Fit hypotheses ---
harmonic: 0.5031 * H(n) + 0.6705 (=0.9896)
log: 0.4982 * ln(n) + 0.9913 (=0.9898)
power: 1.3690 * n^0.1752 (=0.9278)
polynomial: -0.0000n² + 0.0155n + -0.0904 (=-2.8770)
  Took 5.96 seconds.
--- Phase 3: Plot ---
  Saved to random-ends-v1.png

484.36 seconds. The speed was bugging me. 484.36 seconds? Surely it can be faster. Wait a minute... A main function that calls a perform function? Making things efficient? Perhaps I can use... C++? Like a sleeper agent, all of my competitive programming experience came rushing back to me.

Longing. Rusted. Seventeen. Daybreak. Furnace. Nine. Benign. Homecoming. One. Freight Car.

Making it faster

A few insights:

First, this problem is embarrassingly parallel. Each trial is independent of one another. We could totally take advantage of more cores and get a pretty nice speedup.

Next, we don't actually have to remove the ends from the all_ends list. Once we're done with them, we never look at those ends again. Another equivalent way to get this functionality is to simply generate a list of all ends, shuffle them, then go through the list linearly. This is great because, say it with me, heap allocations are slow!

Furthermore, I'm reminded of this famous quote by Robert Galanakis:

The fastest code is the code that does not run.

Robert Galanakis

We are definitely performing too many trials on the smaller string lengths and not enough on the larger ones. Recall the central limit theorem. We can take kk samples from an arbitrary distribution. We know that the mean of these samples is approximately normally distributed with mean μ\mu and variance σ2/k\sigma^2/k. The standard deviation of this distribution, σ/k\sigma/\sqrt{k}, is called the standard error. We don't know the true σ\sigma, but we can approximate it using Bessel's correction: s2=i=1k(xixˉ)2/(k1)s^2 = \sum_{i=1}^{k} (x_i - \bar{x})^2 / (k - 1).

So instead of doing a fixed number of trials, we can keep running trials until the standard error drops below a desired threshold.

Finally, we could definitely use memory more efficiently. Why are we using a dictionary when an array would do just fine? Why not allocate all the memory we could possibly need at the beginning?

Putting it all together, the core simulation function looks like this. Each thread grabs the next unprocessed string count from an atomic counter, runs trials until the standard error is tight enough, and moves on:

void perform_adaptive(
  vector<pair<ul, ul>>& results, // Elements are { # strings, # loops made }
  atomic<ul>& next_n_strings,    // Atomic to grab next string size to calc
  const ul max_n_strings,        // Largest number of strings to calc
  const ul min_trials,           // Min trials before kicking in standard error
  const double epsilon           // Standard error to target
) {
  mt19937 rng(random_device{}());
  const ul max_n_ends = 2*max_n_strings;

  auto end_seq = vector<ul>(max_n_ends); // Pre-allocate based on maximum ends
  auto end_map = vector<ul>(max_n_ends); // we could possibly need.

  for (ul i = 0; i < max_n_ends; i++) // n_strings only increases, so we set
    end_seq[i] = i;                   // end_seq only once.

  ul n_strings;
  while ((n_strings = next_n_strings.fetch_add(1)) <= max_n_strings) {
    const ul n_ends = 2*n_strings;

    ul sum = 0, sum_sq = 0, n = 0; // Stats for standard error stopping
    while (true) {
      // While we don't need to reset end_seq, we do need to reset end_map
      for (ul i = 0; i < n_ends; i += 2) {
        end_map[i]   = i+1; // Could do `end_map[i] = i^1;` with `i++`, but the
        end_map[i+1] = i;   // compiler optimizes it the same way
      }

      shuffle(end_seq.begin(), end_seq.begin() + n_ends, rng);

      ul result = 1; // Always at least 1 loop, so no need to calc last string
      for (ul i = 0; i < n_ends - 2; i += 2) {
        const ul a = end_seq[i];
        const ul b = end_seq[i+1];

        const ul a_other = end_map[a];
        if (a_other == b) {
          result++;
          continue;
        }

        const ul b_other = end_map[b];

        end_map[a_other] = b_other;
        end_map[b_other] = a_other;
      }

      results.push_back({n_strings, result});

      sum    += result;
      sum_sq += result * result;
      n++;
      if (n >= min_trials) {
        double var = ((double)sum_sq - (double)sum * sum / n) / (n - 1);
        double se = sqrt(var / n);
        if (se <= epsilon / 1.96) break; // 95% CI
      }
    }
  }
}

Note the result = 1 on line 39 - the last remaining string must form a loop, so we skip that iteration entirely. It's a tiny optimization, but it made me feel clever.

The main function just spins up threads and collects results. Each thread gets its own pre-allocated result vector to avoid contention:

int main(int argc, char** argv) {
  ios_base::sync_with_stdio(false); // Competitive programming stuff to make
  cin.tie(nullptr);                 // cout faster

  const ul max_n_strings    = stoul(argv[1]);
  const ul min_trials       = stoul(argv[2]);
  const double epsilon      = stod(argv[3]);
  const ul pre_alloc_trials = stoul(argv[4]);

  const ul n_threads = thread::hardware_concurrency() ?: 4;

  atomic<ul> next_n_strings(1);
  vector<thread> threads;

  vector<vector<pair<ul, ul>>> thread_results(n_threads);
  for (auto& v : thread_results)                   // Pre-alloc result arrays
      v.reserve(max_n_strings * pre_alloc_trials); // for performance

  for (ul t = 0; t < n_threads; t++)
    threads.emplace_back(
      perform_adaptive, ref(thread_results[t]), ref(next_n_strings),
      max_n_strings, min_trials, epsilon);

  for (auto& t : threads) t.join();

  cout << "x,y\n"; // Output CSV to stdout
  for (auto& tr : thread_results)
    for (auto& [x, y] : tr)
      cout << x << "," << y << "\n";

  return 0;
}

There are many forms of inter-process communication, but I decided to just simply spit the results out to stdout as a csv file. Taking everything into account, need we update the Python script to shell out to the compiled binary and read the CSV output back in:

def run_trials_cpp(
  max_n_strings: int,
  min_trials: int = 10,
  epsilon: float = 0.1,
  pre_alloc_trials: int = 1000,
) -> pd.DataFrame:
  result = subprocess.run(
    ['./random-ends-v2',
     str(max_n_strings), str(min_trials),
     str(epsilon), str(pre_alloc_trials)],
    stdout=subprocess.PIPE, stderr=None, text=True, check=True,
  )
  return pd.read_csv(io.StringIO(result.stdout))

Moment of truth:

--- Phase 1: Gather data max_n_strings=1000 ---
  Took 0.94 seconds.
--- Phase 2: Fit hypotheses ---
harmonic: 0.5055 * H(n) + 0.6530 (=0.9895)
log: 0.5006 * ln(n) + 0.9753 (=0.9897)
power: 1.3651 * n^0.1756 (=0.9286)
polynomial: -0.0000n² + 0.0155n + -0.0904 (=-2.8137)
  Took 6.12 seconds.
--- Phase 3: Plot ---
  Saved to random-ends-v2.png

484 seconds down to 0.94. A 515x speedup. I'm not going to pretend I didn't fist pump at my desk.

1,000 string lengths with adaptive trials, completed in under a second.

So obviously the polynomial approximation is right out. The power model also seems to drift. That leaves the harmonic model and the natural log model, both with suspiciously similar R2R^2 values. Suspicious enough to make you wonder if they're secretly the same thing...

Making it even faster

In my thirst for optimization, I tried to think about the problem in a slightly different manner. I began to think about how I could potentially eliminate the end_seq and end_map vectors altogether.

In the dark recesses of my brain, I conjured up a vague idea of feistel ciphers, but they only work for powers of two. While yes they are O(1)O(1) in space complexity, in the worst case, for an arbitrary nn, we will have to perform 2log2(n)2^{\lfloor \log_2( n ) \rfloor} operations. No, this will not do.

Instead, I reached into my hat of tricks honed from competitive programming, and pulled out "relabeling".

Right now, we are labeling all of the ends from 00 to 2n12n-1 at the start of the program. Recall that at each step, we either make a loop or we don't. Another way to phrase this is that we go from having kk strings to k1k-1 strings and 00 or 11 loops.

But notice that once we consume 2 ends, they never matter anymore. Poof! They are gone, never to be used at a future step. That's why we could get away with shuffling end_seq.

In the same vein, once a loop at a step is made, from the perspective of the remaining strings, it doesn't matter anymore either. A loop can't be used to make another loop, nor can it be used to make a longer string.

Another way to view this is that an end is either part of a string or used-up.

If an end a is part of a string it means that we have not encountered it in end_seq yet, and end_map[a] = b; end_map[b] = a holds. If an end is used-up, that means we have encountered it in end_seq and either it was a loop so end_map[a] = b; end_map[b] = a still holds, or a now points to something that does not ever point back to a. Crucially, after encountering a in end_seq, a is never referenced by a non-used-up end ever again.

So what if, after every step, we removed the used-up elements and relabeled every end from 00 to (now) 2n32n-3? At first, this seems like a step backward. Indeed, if we had to go through the end_seq and end_map vectors and relabel everything all over again, it take O(n)O(n). And then we would have to shuffle end_seq again!

...or would we?

Making it trivial

We are using a shuffled end_seq because we need a random permutation of indices so that at each step we can select the next two indices from the list and perform the calculation. But now we are relabeling them at each step. So what we would do instead is at each step, generate a list of numbers from 0 to n_ends, shuffle it and pick the first two.

That's choosing without replacement!

So instead, we can generate 2 random numbers from 0 to n_ends, making sure they are not equal. And since we "relabeled" each string at each step, we know that a loop can only happen if we select numbers like (0,1)(0, 1), (1,2)(1,2), (3,4)(3,4), etc...

So the inner part of the function becomes:

ul result = 0;
for (ul i = 0; i < n_ends; i += 2) {
  uniform_int_distribution<ul> distr(0, n_ends - 1 - i);

  const ul a = distr(rng);
  ul b = distr(rng);
  while (a == b) b = distr(rng);

  result += (a ^ 1) == b ? 1 : 0;
}

But let's take this relabelling idea one step further. Say we select one end. Only one other end completes a loop, and there are n_ends - 1 other ends to choose from. So we just need to check if a random pick from those remaining ends happens to be the partner.

ul result = 0;
for (ul i = 0; i < n_ends; i += 2) {
  uniform_int_distribution<ul> distr(0, (n_ends - 1) - i - 1);
  result += (distr(rng) == 0) ? 1 : 0;
}

But wait a second. Why are we generating this data? We're generating the data to get the expected value of the process. But looking at this for loop, we can calculate that directly. The result is 11 with probability 1n_ends1i\frac{1}{\texttt{n\_ends} - 1 - i} and 00 otherwise. So the expected value of this iteration is exactly 1n_ends1i\frac{1}{\texttt{n\_ends} - 1 - i}.

And the expected value of a sum, is the sum of expected values. And we can actually flip that for loop. And hey loop overlapping sub-problems...

Oh no... I can feel the problem collapsing on a solution...

#include <cstdio>
#include <string>

typedef unsigned long ul;

int main(int argc, char** argv) {
  const ul max_n_strings = std::stoul(argv[1]);

  double ev = 0.0;
  for (ul n_strings = 1; n_strings <= max_n_strings; n_strings++) {
    ev += 1.0 / (2*n_strings - 1);          // Don't store anything for SPEED!
    fwrite(&ev, sizeof(double), 1, stdout); // Output raw doubles for SPEED!
  }
}

And update the python:

def run_exact_cpp(
  max_n_strings: int,
) -> pd.DataFrame:
  result = subprocess.run(
    ['./random-ends-v3', str(max_n_strings)],
    stdout=subprocess.PIPE, check=True,
  )
  ev = np.frombuffer(result.stdout, dtype=np.float64)
  return pd.DataFrame({'x': np.arange(1, len(ev) + 1), 'y': ev})

And bump up the simulation to 10000, why not.

--- Phase 1: Gather data max_n_strings=10000 ---
  Took 0.00 seconds.
--- Phase 2: Fit hypotheses ---
harmonic: 0.5013 * H(n) + 0.6814 (=1.0000)
log: 0.5000 * ln(n) + 0.9815 (=1.0000)
power: 0.8659 * n^0.2109 (=0.3192)
  Took 8.69 seconds.
--- Phase 3: Plot ---
  Saved to random-ends-v3.png

0.00 seconds. Now that's fast. And the plot:

10,000 string lengths with exact expected values, computed in under a millisecond.

I removed the polynomial model as it was just wildly wrong at this point.

Doing the math

Let's formalize what our code optimizations just discovered. Through relabeling, we reduced the simulation to this: at each step, there are 2ni2n - i ends remaining (where ii steps by 2). We pick one end. Of the 2ni12n - i - 1 other ends, exactly one is its partner. So the probability of a loop is 12ni1\frac{1}{2n - i - 1}, which is exactly the distr(rng) == 0 check from our second-to-last reduction.

More formally: define XiX_i as 1 if the ii-th pairing creates a loop, 0 otherwise. By linearity of expectation:

E[total loops]=i=1nE[Xi]=i=1nP(loop at step i)\mathbb{E}[\text{total loops}] = \sum_{i=1}^{n} \mathbb{E}[X_i] = \sum_{i=1}^{n} P(\text{loop at step } i)

At step ii, there are 2(ni+1)2(n - i + 1) ends. We pick one, leaving 2(ni+1)1=2n2i+12(n - i + 1) - 1 = 2n - 2i + 1 others. One is the partner:

P(loop at step i)=12n2i+1=12(ni)+1P(\text{loop at step } i) = \frac{1}{2n - 2i + 1} = \frac{1}{2(n - i) + 1}

Substituting k=nik = n - i:

E[total loops]=k=0n112k+1=11+13+15++12n1\mathbb{E}[\text{total loops}] = \sum_{k=0}^{n-1} \frac{1}{2k+1} = \frac{1}{1} + \frac{1}{3} + \frac{1}{5} + \cdots + \frac{1}{2n-1}

The sum of odd reciprocals - which is exactly what ev += 1.0 / (2*n_strings - 1) computes in random-ends-v3.cpp. The code was the proof; we just didn't know it yet.

And that's why the harmonic model fit so well. The nn-th harmonic number is Hn=k=1n1kH_n = \sum_{k=1}^{n} \frac{1}{k}, which can be split into even and odd terms. Our sum is the odd half-harmonic (split H2nH_{2n} into its odd and even terms, noting k=1n12k=12Hn\sum_{k=1}^{n} \frac{1}{2k} = \frac{1}{2}H_n):

k=1n12k1=H2n12Hn12Hn+ln2\sum_{k=1}^{n} \frac{1}{2k-1} = H_{2n} - \frac{1}{2}H_n \approx \frac{1}{2}H_n + \ln 2

Look familiar? That's exactly the 0.5013 * H(n) + 0.6814 from the fit — since ln20.6931\ln 2 \approx 0.6931.

And since Hnln(n)+γH_n \approx \ln(n) + \gamma (where γ0.5772\gamma \approx 0.5772 is the Euler-Mascheroni constant), we get that the expected number of loops grows like 12ln(n)\frac{1}{2}\ln(n). That's why the log model and the harmonic model had nearly identical R2R^2 values - they're basically the same thing up to constants! The harmonic series and ln\ln are joined at the hip, and always have been.

For the n=50n = 50 case, we can use Python's Fraction class to get the exact answer:

from fractions import Fraction
n = 50
result = sum(Fraction(1, 2*k-1) for k in range(1, n+1))
print(f'~ {float(result):.10f}')
print(f'{result}')
~ 2.9377748485
3200355699626285671281379375916142064964/1089380862964257455695840764614254743075

Just shy of 3 loops.

Conclusion

So why did I spend a weekend writing a C++ program with multithreading, adaptive sampling, and PyTorch curve fitting to solve a problem that has a two-line closed-form solution? I genuinely don't know. But I had a blast doing it.

I think there's something to be said for the "brute force it first, think later" approach. The simulation gave me intuition. The curve fitting told me what shape the answer had. And only then did the math click into place. Would I have seen the odd reciprocal sum staring at a blank page? Maybe. But probably not as fast as I did staring at a graph that was screaming "I'm a log, you idiot."

And hey, I got to write some C++ for the first time in a while. That alone was worth it.


Source files: