Trying to find a more stable non-linear least squares method (LSQ) than the one currently used in Newstar, I found another few areas that could be added or improved in the existing routines. In the end I produced a set of routines to handle linear and non-linear cases, both with or without SVD, and constraint equations.
In addition the present document describes the usage and background of the different routines.
Least squares solutions in Aips++ have the following general background:
The following terminology is used throughout:
number of unknowns to be solved
number of simultaneous knowns
number of condition equations
merit function
complex conjugate of
real part of
imaginary part of
column vector of unknowns
vector of factors
in condition equation
(
)
the array of condition equations
array: normal equations matrix
column vector: right-hand side normal equations
model value for condition equation ( )
measured value for condition equation ( )
standard deviation for condition equation ( )
weight for condition equation ()
shorthand for
The condition equations can, with the above definitions, be written in one of the following ways:
(1) |
or as:
or as:
(3) |
The normal matrix is defined as:
(4) |
resulting in the normal equations:
(5) |
where and is the covariance matrix of the observations. Here only covariance matrices with the same value in each column (the ‘weight’ ) are considered.
The merit function we want to minimise is:
For the minimum of holds:
(7) |
which leads to the following set of normal equations:
(8) |
or in matrix form:
(9) |
or:
(10) |
This symmetric set of equations can be solved, if the matrix is positive definite, i.e. if there are no dependencies between the columns. The solution is given by:
(11) |
After solution for the unknowns , as defined in (6) can be directly calculated if we rewrite it as:
(12) |
Noting that and using equation (9) to note that , we can rewrite (12) as:
(13) |
The could be used to assess the goodness of fit if the actual ’s were known, and the errors are normal distributed. In general the actual values of are not known, and often the distribution is not normal (.e.g. if we solve for logarithmic values). An estimate of the standard deviation can be made by:
(14) |
which can be estimated by:
(15) |
to give ‘an error per observation’. The ‘error per unit weight’, or the standard deviation, can be expressed as:
(16) |
The uncertainty in the solution can be expressed as:
(17) |
Since
(18) |
we have:
(19) |
leading to:
(20) |
Doing the sums, this equation reduces to:
(21) |
If the was not known originally, the estimate of the standard uncertainties in the unknowns are:
(22) |
If the uncertainties in the unknowns are wanted, the inverse matrix is calculated by backsubstitution of the identity matrix, and the uncertainties by (22).
The merit function we want to minimise in this case is:
(23) |
Differentiating leads to the following set of normal equations:
(24) |
The normal matrix is Hermitian. It can be solved by Choleski methods. However, internally the matrix is converted to a real form. Although this has an, in general, small memory penalty, it has no influence on CPU time, and makes it possible to use the same routines for complex and real solutions. The conversion to real is done by splitting each element of the normal matrix into and replacing it by:
(25) |
and simular replacements for the vector elements and as:
(26) |
and:
(27) |
Another reason for solving real rather than complex equations is given in the next section.
In cases where both the unknowns and their complex conjugates appear in the condition equations, differentiating the merit function (23) will not produce a symmetric or Hermitian normal matrix, since there exists no linear relation between and . We could, of course, solve for complex equations with added constraints that the sum of even and odd unknowns must be real, and their difference imaginary, but this will lead to complex equations.
If, however, we consider each complex unknown as two real unknowns (i.e. and ) then differentiating (23) produces the following symmetric set of real equations:
(28) |
A number of special cases can be distinguished.
In the ‘normal’ complex case (previous section), , and (28) reduces to (25).
If all and are real, but specified, e.g., as a complex number, (28) deteriorates into:
(29) |
Note that in this case there is a true separation of the real and imaginary parts of the different unknowns, and two separate real solutions with each unknowns will produce exactly the same result.
The full and special cases are all catered for in the aips++ routines.
Using arguments comparable to those in (2.1.1), we can get (13) for the complex case as:
(30) |
with the ’s and ’s as defined in (29).
If there are not enough independent condition equations, the normal matrix cannot be inverted, and a call to WNMLTN will fail with a .false. return value.
The equations could still be solved if some additional ‘constraint’ equations would be introduced. In the more complex cases the precise, let alone the best, form for these additional equations is difficult to determine (e.g. the redundancy situation in Westerbork).
A method known as ‘Singular value decomposition’ (SVD) can be used to obtain the minimal set of orthogonal equations that have to be added to solve the LSQ problem. Several implementations exist in the literature.
In general we can distinguish three types of constrained equations:
All three cases are handled in the LSQ package, the first two in the same way.
The general constraint situation arises from the use of Lagrange multiplicators. Assume that in addition to the condition equations, with measured values, we have a set of rigorous equations:
(31) |
We must therefore make minimal, subject to the set of (31), or:
(32) |
subject to the conditions:
(33) |
which leads to a set of equations:
(34) |
together with the (31).
Note that I have chosen for having constraint equations linear in the unknowns, with a zero value. In cases where this is not adequate (e.g. ) a simple linear transformation will suffice to make it e.g. .
Defining the second term in (34) as , we can write our expanded set of normal equations as:
(35) |
The routines to handle this situation use a method based on “Pseudo-inverse berekening en Cholesky factorisatie”, Hans van der Marel, 27 maart 1990, Faculty of Geodesy, Delft University. I will briefly describe Hans’ paper, together with the additions I have made.
In the case we are considering, the normal equation has not full column rank. Therefore, there exists, if the rank defect is , an matrix , a basis for the null-space of , with . If we assume that has just sufficient constraint equations to solve the rank defect, the inverse of the matrix in (35) is:
(36) |
with the pseudo-inverse of . A number of relations hold for , the most important for us that and , or is a base for the null-space of . For a mininorm solution we can choose . In that case is the Moore-Penrose inverse, and is regular and symmetric.
We can now proceed in the following way:
If , we can say that after Choleski steps, we have:
(37) |
with the Cholesky factor of , and .
The collinearity angle can be determined for the current column by: .
Setting , and , we can write the above steps as:
(38) |
Errors are determined in the same way as described in (2.1.1), where it should be noted that (13) can either be used summing over variables and using the first guess of , or over and using the final . Internally the first option is used. The uncertainties in are determined by calculating the covariance matrix. Using similar arithmetic as in (2.1.1), it can be shown that:
(39) |
is calculated by first solving and then .
In the case of known constraints, hence when is known in (35), this equation can be solved. Cholesky decomposition does not work in this case, and a, transparent, Crout LU decomposition is done to determine the (symmetric) covariance (i.e. inverse) matrix of the left hand side of (35). .
Errors are determined as explained in (2.1.1). Since all constraint equations are , the sum in (13) is taken over , not over if are the number of constraint equations.
In the case of non-linear condition equations, no simple solution to minimise the merit function (e.g. (6)) exists. A simple solution is to take the condition equation, make a first order Taylor expansion around an estimated ; solve for , and iterate till solution found.
However, better and more stable methods exist (e.g. steepest-descend) for some circumstances. A method that seems to be quite stable and using both steepest-descent and Taylor expansion, is the method by Levenberg-Marquardt (see e.g. “Numerical recipes”, W.H. Press et al., Cambridge University Press).
If we have an estimate for , we can find a better one by:
(40) |
where is the Hessian matrix of . If our approximation is not good enough, we could use the steepest-descent by calculating:
(41) |
The Hessian matrix has as elements:
(42) |
By dropping the second term in (42), and multiplying each term with , and defining the first derivative of as , we can combine the equations (40), (41) into:
(43) |
which can be solved as standard normal equations.
Choosing is the crux of the matter, and where to stop iterating. A start value of is used in the following routines. The method looks at the for . If it has increased over the value of for , is unchanged, and . If there is a decrease; a new value for is used, and . Iteration can be stopped if the fractional decrease in is small (say ); never if there is an increase in .
Errors are calculated as described in (2.1.1). The errors returned by WNMLSN are, of course, only approximations, since the original equations are non-linear, but give a good impression of the residuals. The covariance matrix should be calculated by doing a final linear run on the residuals, and solve the equations.
An LSQ-object takes a total space of:
where is the number of unknowns ( if complex); the number of simultaneously to be solved equations; the number of external constraint equations. In the non-linear case two LSQ-objects are used. In cases where the inverted normal array is calculated (known constraints) a temporary storage of 8-byte words is used.
The following calls are available:
wnb, August 5, 2015