calculating the p-values in a constrained least squares

I have been using Matlab to perform unconstrained least squares (ordinary least squares) and it automatically outputs the coefficients, test statistic and the p-values.

My question is, upon performing constrained least squares (strictly nonnegative coefficients), it only outputs the coefficients, WITHOUT test statistics, p-values.

Is it possible to calculate these values to ensure significance? And why is it not directly available on the software (or any other software for that matter?)

Why not use the “normal equations” to find simple least squares coefficients?

I saw this list here and couldn’t believe there were so many ways to solve least squares. The “normal equations” on Wikipedia seemed to be a fairly straight forward way:
$$
{displaystyle {begin{aligned}{hat {alpha }}&={bar {y}}-{hat {beta }},{bar {x}},\{hat {beta }}&={frac {sum _{i=1}^{n}(x_{i}-{bar {x}})(y_{i}-{bar {y}})}{sum _{i=1}^{n}(x_{i}-{bar {x}})^{2}}}end{aligned}}}
$$

So why not just use them? I assumed there must be a computational or precision issue given that in the first link above Mark L. Stone mentions that SVD or QR are popular methods in statistical software and that the normal equations are “TERRIBLE from reliability and numerical accuracy standpoint”. However, in the following code, the normal equations are giving me accuracy to ~12 decimal places when compared to three popular python functions: numpy’s polyfit; scipy’s linregress; and scikit-learn’s LinearRegression.

What’s more interesting is that the normal equation method is the fastest when n = 100000000. Computational times for me are: 2.5s for linregress; 12.9s for polyfit; 4.2s for LinearRegression; and 1.8s for the normal equation.

Code:

import numpy as np
from sklearn.linear_model import LinearRegression
from scipy.stats import linregress
import timeit

b0 = 0
b1 = 1
n = 100000000
x = np.linspace(-5, 5, n)
np.random.seed(42)
e = np.random.randn(n)
y = b0 + b1*x + e

# scipy                                                                                                                                     
start = timeit.default_timer()
print(str.format('{0:.30f}', linregress(x, y)[0]))
stop = timeit.default_timer()
print(stop - start)

# numpy                                                                                                                                      
start = timeit.default_timer()
print(str.format('{0:.30f}', np.polyfit(x, y, 1)[0]))
stop = timeit.default_timer()
print(stop - start)

# sklearn                                                                                                                                    
clf = LinearRegression()
start = timeit.default_timer()
clf.fit(x.reshape(-1, 1), y.reshape(-1, 1))
stop = timeit.default_timer()
print(str.format('{0:.30f}', clf.coef_[0, 0]))
print(stop - start)

# normal equation                                                                                                                            
start = timeit.default_timer()
slope = np.sum((x-x.mean())*(y-y.mean()))/np.sum((x-x.mean())**2)
stop = timeit.default_timer()
print(str.format('{0:.30f}', slope))
print(stop - start) 

Unsolved Mysteries: Magic Square of Squares

This is the first in what will hopefully be a series of Unsolved Mysteries posts.

Note that this puzzle has no known solution, nor any proof that a solution is impossible. We will see how smart the denizens of Puzzling.SE actually are…!


Most people are familiar with the concept of a Magic Square. (If not, follow the link to read up on it.)

There are algorithms available that make it trivial to construct a magic square of almost any size, but by adding a few constraints to the problem, it becomes much more challenging.

Consider the following $4times4$ magic square, where every entry is itself a square number, and the rows, columns and diagonals all sum to $8515$:

$$
begin{array}\
68^2&29^2&41^2&37^2\
17^2&31^2&79^2&32^2\
59^2&28^2&23^2&61^2\
11^2&77^2&8^2&49^2
end{array}
$$

Note that
$68^2 + 29^2 + 41^2 + 37^2 = 17^2 + 31^2 + 79^2 + 32^2$
but
$68 + 29 + 41 + 37 ne 17 + 31 + 79 + 32$

Only the squared values have the properties of a magic square.

Many such $4times4$ squares have been constructed, but as of yet, no one has succeeded in constructing a $3times3$ magic square with the same property, nor in proving that no such magic square exists.

Your challenge, therefore, is as follows:

A) Build a $3times3$ magic square where each of the nine entries in the square is itself a square number.

or

B) Prove that no such square exists.


For the pedantic among us (you know who you are), here are a few additional constraints:

  • Each entry in the square must be unique. (A square consisting entirely of $4$s is not valid.)
  • The definition of “square number” implies this, but I will spell it out here for those who like to quibble: The entries (before squaring) must be integers. Thus a magic square using values ${ sqrt1^2, sqrt2^2, sqrt3^2, sqrt4^2, sqrt5^2, sqrt6^2, sqrt7^2, sqrt8^2, sqrt9^2}$ is not valid (although, of course, $sqrt1^2$, $sqrt4^2$, and $sqrt9^2$ can be used in a square, being proper square numbers ($=1^2, 2^2, 3^2$).
  • This also means that using complex numbers, limits, representations of infinity, or any other abstract mathematical concept is not valid. The intent of the question is obvious; please stick to that.

Estimate $sigma$ from a data given the total frequency and the sum of squares of frequencies.

In a frequency table with classes of equal width,$N_1$ is the total frequency and $N_2$ is the sum of squares of the frequencies. Assuming the distribution to be approximately normal with mean $0$, I’m asked to suggest a quick estimate of the standard deviation $sigma$ in terms of $N_1$ and $N_2$.

My approach: I’ve first deduced the formula involving the sum of the
square of the density of an $N(mu,sigma)$ variate (it doesn’t depend
on $mu$) :

$displaystyleint_{-infty}^{infty}[f(x)]^2,mathrm{dx}=dfrac{1}{2sigmasqrt{pi}}tag*{}$

Now, $displaystylesum_{i}dfrac{f(x)}{N_1}=1$. Then
$displaystylesum_{i}left(dfrac{f(x)}{N_1}right)^2=dfrac{N_2}{N_1^2}.$

I think $displaystylesum_{i}left(dfrac{f(x)}{N_1}right)^2cdot h$
resembles to
$displaystyleint_{-infty}^{infty}[f(x)]^2,mathrm{dx}=dfrac{1}{2sigmasqrt{pi}}$ where $h=$ class width.

So,
$hcdotdfrac{N_2}{N_1^2}=dfrac{1}{2sigmasqrt{pi}}impliessigma=dfrac{2hsqrt{pi}N_2}{N_1^2}.$

I’m confused about the answer, so please help me and suggest edits if
its wrong.

Prove that given any five integers, there will be three for which the sum of the squares of those integers…

Basically the question is asking us to prove that given any integers $$x_1,x_2,x_3,x_4,x_5$$ Prove that 3 of the integers from the set above, suppose $$x_a,x_b,x_c$$ satisfy this equation: $$x_a^2 + x_b^2 + x_c^2 = 3k$$ So I know I am suppose to use the pigeon hole principle to prove this. I know that if I have 5 pigeons and 2 holes then 1 hole will have 3 pigeons. But what I am confused about is how do you define the hole? Do I just say that the container has a property such that if 3 integers are in it then those 3 integers squared sum up to a multiple of 3?

Practical implementation of Least Squares Monte Carlo (tweaks and pittfalls)

The Longstaff-Schwartz LSM approach is nowadays ubiquitous(at least in the academic literature) in pricing path dependant derivatives. Up to now I have mostly worked with lattice methods. My experience in impelemting those has shown that there are often ways to tweak them and also lot of pittfalls along the way.

To those of you who have some experience in working with LSM:

  1. Aside from the usual Monte-Carlo-Optimization techniques (e.g. variance reduction, importance sampling etc.) are there any optimizations that are particular to the LSM approach ?
    (Perhaps some paper on the choice of the interpolating polynomial) ?

  2. What are possible pittfalls when implementing and working with the
    model ? When can LSM go really wrong/ in which cases does it fail to price correctly ?

Lagrange four squares theorem

Lagrange’s four square theorem states that every non-negative integer is a sum of squares of four non-negative integers. Suppose $X$ is a subset of non-negative integers with the same property, that is, every non-negative integer is a sum of squares of four elements if $X$.

$bullet$ Is $X={0,1,2,ldots}$?

$bullet$ If not what is a minimal set $X$ with the given property?

BitmapFont in libGDX become squares after hiding and showing screen

My android app written on libGDX has a lot of text. I use FreeTypeFontGenerator to create a font:

public static BitmapFont setupHandWritingFont() {
    FreeTypeFontGenerator generator = new FreeTypeFontGenerator(Gdx.files.internal("fonts/handwriting.ttf"));
    FreeTypeFontGenerator.FreeTypeFontParameter parameter = new FreeTypeFontGenerator.FreeTypeFontParameter();
    parameter.size = 36;
    parameter.genMipMaps = true;
    parameter.magFilter = Texture.TextureFilter.Linear;
    parameter.color = Color.BLACK;
    parameter.characters = "абвгдежзийклмнопрстуфхцчшщъыьэюяabcdefghijklmnopqrstuvwxyzАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789][_!$%#@|\/?-+=()*&.;:,{}"´`'<>";
    BitmapFont font = generator.generateFont(parameter);
    generator.dispose();
    return font;
}

After some actions, which hide the screen for some time (such as showing ad or hiding the application) text transforms into squares. It happens very often, but not every time. Text looks like this:
Wrong displaying of text

Instead of this:
Right displaying of text

Did anybody face this problem before? What may I try to do?

Fast way to obtain SSR (Sum of Squares residuals) from QR in least square model?

I am using a linear regression, yet the only output I need is the Sum of Squared Residuals (SSR), I don’t care about the coefficients. (Context is a non-linear LS, which is linear given an extra parameter, so I am running a grid search of least-squares SSR)

Is there a fast algorithm to extract the SSR only, with the QR (or other) decomposition? In matrix form:

$$SSR = Y’PY = Y'(I – X(X’X)^{-1}X’)Y$$
and with the QR decomposition, $X = QR$, one has $X(X’X)^{-1}X’=QQ’$, which suggests one does not need to do a matrix inversion at all, though on the other side, one needs to grow a $n times n$ matrix…
SO:

  1. Is there a fast algorithm to obtain the SSR only, or am I better sticking with usual function, that compute beta, and also the residuals

  2. Is there such an implementation in R, (or linpack/fortran, called by R?). I saw qr.resid(), but it seems to be slower than the lm.fit(), that returns everything? See benchmark below (changes when X is larger though).

Thanks!

data(EuStockMarkets)
X <- EuStockMarkets[1:100, 1:3]
Y <- EuStockMarkets[1:100, 4]


library(microbenchmark)  
microbenchmark(lm.fit = c(crossprod(lm.fit(X, Y)$residuals)),
               .lm.fit = c(crossprod(.lm.fit(X, Y)$residuals)),
               qr_res = c(crossprod(qr.resid(qr(X), Y))),
               times = 10)
#> Unit: microseconds
#>     expr    min     lq    mean  median     uq     max neval
#>   lm.fit 44.566 45.704 97.3347 47.0335 51.143 543.285    10
#>  .lm.fit 11.144 11.995 15.0783 13.1285 13.856  35.594    10
#>   qr_res 49.495 51.841 79.9043 53.4505 55.014 317.989    10

Created on 2018-10-20 by the reprex package (v0.2.1)