I saw this list here and couldn’t believe there were so many ways to solve least squares. The “normal equations” on Wikipedia seemed to be a fairly straight forward way:

$$

{displaystyle {begin{aligned}{hat {alpha }}&={bar {y}}-{hat {beta }},{bar {x}},\{hat {beta }}&={frac {sum _{i=1}^{n}(x_{i}-{bar {x}})(y_{i}-{bar {y}})}{sum _{i=1}^{n}(x_{i}-{bar {x}})^{2}}}end{aligned}}}

$$

So why not just use them? I assumed there must be a computational or precision issue given that in the first link above Mark L. Stone mentions that SVD or QR are popular methods in statistical software and that the normal equations are “TERRIBLE from reliability and numerical accuracy standpoint”. *However*, in the following code, the normal equations are giving me accuracy to ~12 decimal places when compared to three popular python functions: numpy’s polyfit; scipy’s linregress; and scikit-learn’s LinearRegression.

What’s more interesting is that the normal equation method is the fastest when n = 100000000. Computational times for me are: 2.5s for linregress; 12.9s for polyfit; 4.2s for LinearRegression; and 1.8s for the normal equation.

Code:

```
import numpy as np
from sklearn.linear_model import LinearRegression
from scipy.stats import linregress
import timeit
b0 = 0
b1 = 1
n = 100000000
x = np.linspace(-5, 5, n)
np.random.seed(42)
e = np.random.randn(n)
y = b0 + b1*x + e
# scipy
start = timeit.default_timer()
print(str.format('{0:.30f}', linregress(x, y)[0]))
stop = timeit.default_timer()
print(stop - start)
# numpy
start = timeit.default_timer()
print(str.format('{0:.30f}', np.polyfit(x, y, 1)[0]))
stop = timeit.default_timer()
print(stop - start)
# sklearn
clf = LinearRegression()
start = timeit.default_timer()
clf.fit(x.reshape(-1, 1), y.reshape(-1, 1))
stop = timeit.default_timer()
print(str.format('{0:.30f}', clf.coef_[0, 0]))
print(stop - start)
# normal equation
start = timeit.default_timer()
slope = np.sum((x-x.mean())*(y-y.mean()))/np.sum((x-x.mean())**2)
stop = timeit.default_timer()
print(str.format('{0:.30f}', slope))
print(stop - start)
```