NOTE 224 – AIPS++ Least Squares background

Wim Brouw

22 January 1999
A pdf version of this note is available.

Contents

1 Introduction
2 Linear equations
 2.1 Real
  2.1.1 Errors
 2.2 Complex
  2.2.1 Separable complex
  2.2.2 Errors
3 Dependant linear equations
 3.1 Unknown constraints
  3.1.1 Errors
 3.2 Known constraints
  3.2.1 Errors
4 Non-linear equations
 4.1 Errors
5 Summary

1 Introduction

Trying to find a more stable non-linear least squares method (LSQ) than the one currently used in Newstar, I found another few areas that could be added or improved in the existing routines. In the end I produced a set of routines to handle linear and non-linear cases, both with or without SVD, and constraint equations.

In addition the present document describes the usage and background of the different routines.

Least squares solutions in Aips++ have the following general background:

  1. all are based on creating normal equations from an, initially unknown, number of condition equations
  2. the actual LSQ-object is a symmetric (or Hermitian) matrix
  3. matrix inversions are done using in-place Cholesky methods where possible: it is for standard use the fastest, least memory consuming and a stable method. In cases where the inverse matrix is needed, the method uses resources of the same order as full LU-decomposition (only if explicit constraint equations are present)
  4. input/output to the LSQ-object is done in any precision, internally all calculations are done in double precision
  5. complex numbers are assumed to be contiguous pairs of real numbers with the real part being the first

The following terminology is used throughout:

n

number of unknowns to be solved

m

number of simultaneous knowns

N

number of condition equations

χ2

merit function

z

complex conjugate of z

z

real part of z

z

imaginary part of z

x

column vector of unknowns x0,,xn1

ai

vector of factors a0,i,,an1,i in condition equation i
( i = 0,,N 1 )

C

the N × n array of condition equations

A

n × n array: normal equations matrix

L

n column vector: right-hand side normal equations

yi

model value for condition equation i ( i = 0,,N 1 )

li

measured value for condition equation i ( i = 0,,N 1 )

σi

standard deviation for condition equation i ( i = 0,,N 1 )

wi

weight for condition equation i (wi = 1σi2)

[ab]

shorthand for i=0N1wiaibi

The condition equations can, with the above definitions, be written in one of the following ways:

ai x = li i = 0,,N 1 (1)

or as:

k=0n1a ikxk = li i = 0,,N 1 (2)

or as:

C x = l (3)

The normal matrix is defined as:

A = CT Q1 C (4)

resulting in the normal equations:

A x = L (5)

where L = CT Q1 l and Q is the covariance matrix of the observations. Here only covariance matrices with the same value in each column i (the ‘weight’ wi) are considered.

2 Linear equations

2.1 Real

The merit function we want to minimise is:

χ2 = i=0N1li yi σi 2 (6)

For the minimum of χ2 holds:

χ2 xk = 0 k = 0,,n 1 (7)

which leads to the following set of normal equations:

i=0N1 li yi σi2 yi xk = 0 k = 0,,n 1 (8)

or in matrix form:

a0a0 a0a1 a0an1 a1a0 a1a1 a1an1 a n1a0 an1a1 an1an1 x = a0l a1l an1l (9)

or:

A x = L (10)

This symmetric set of equations can be solved, if the matrix is positive definite, i.e. if there are no dependencies between the columns. The solution is given by:

x = A1 L (11)

2.1.1 Errors

After solution for the unknowns x, χ2 as defined in (6) can be directly calculated if we rewrite it as:

χ2 = [ll] 2 k=0n1[a kl]xk + i=0N1w iyi2 (12)

Noting that yi = k=0n1akxk and using equation (9) to note that [ail] = k=0n1[aiak]xk, we can rewrite (12) as:

χ2 = [ll] k=0n1x k[akl] (13)

The χ2 could be used to assess the goodness of fit if the actual σi’s were known, and the errors are normal distributed. In general the actual values of σi are not known, and often the distribution is not normal (.e.g. if we solve for logarithmic values). An estimate of the standard deviation can be made by:

σ2 = li yi 2 N n (14)

which can be estimated by:

σo2 = χ2 N n (15)

to give ‘an error per observation’. The ‘error per unit weight’, or the standard deviation, can be expressed as:

σw2 = χ2 [1] N N n (16)

The uncertainty in the solution xi can be expressed as:

σ2 x i = k=1N1σ k2 xi yk2 (17)

Since

xi = k=0n1A1 ik akl (18)

we have:

xi yk = j=0n1A1 ij akj σk2 (19)

leading to:

σ2 x i = k=0n1 l=0n1A1 ikA1 il j=0N1akjalj σj2 (20)

Doing the sums, this equation reduces to:

σ2 x i = A1 ii (21)

If the σi was not known originally, the estimate of the standard uncertainties in the unknowns x are:

σ xi = σoA 1 ii (22)

If the uncertainties in the unknowns are wanted, the inverse matrix A1 is calculated by backsubstitution of the identity matrix, and the uncertainties by (22).

2.2 Complex

The merit function we want to minimise in this case is:

χ2 = i=0N1li yi σi li yi σi (23)

Differentiating χ2 leads to the following set of normal equations:

a0a0 a0a1 a0an1 a1a0 a1a1 a1an1 a n1a0 an1a1 an1an1 x = a0l a1l an1l (24)

The normal matrix is Hermitian. It can be solved by Choleski methods. However, internally the matrix is converted to a real form. Although this has an, in general, small memory penalty, it has no influence on CPU time, and makes it possible to use the same routines for complex and real solutions. The conversion to real is done by splitting each element Aij of the normal matrix into Aij + ıAij and replacing it by:

Aij = Aij Aij Aij Aij (25)

and simular replacements for the vector elements xi and Li as:

xi = xi xi (26)

and:

Li = Li Li (27)

Another reason for solving real rather than complex equations is given in the next section.

2.2.1 Separable complex

In cases where both the unknowns x and their complex conjugates x appear in the condition equations, differentiating the merit function (23) will not produce a symmetric or Hermitian normal matrix, since there exists no linear relation between xi and xi. We could, of course, solve for 2n complex equations with added constraints that the sum of even and odd unknowns must be real, and their difference imaginary, but this will lead to 4n complex equations.

If, however, we consider each complex unknown as two real unknowns (i.e. xi and xi) then differentiating (23) produces the following symmetric set of 2n real equations:

a0a0 a1a0 a2a0 a2n1a0 a0a1 a1a1 a2a1 a2n1a1 a 0a2n1a1a2n1a2a2n1a2n1a2n1

x0 x0 x1 x n1 = a0l a1l a2l a 2n1l (28)

A number of special cases can be distinguished.

In the ‘normal’ complex case (previous section), a2i a2i+1, and (28) reduces to (25).

If all a2i and a2i+1 are real, but specified, e.g., as a complex number, (28) deteriorates into:

a0a0 0 a0a1 0 0 a0a0 0 a0an1 a1a0 0 a1a1 0 0 an1a0 0 an1an1

x0 x0 x1 x n1 = a0l a0l a1l a n1l (29)

Note that in this case there is a true separation of the real and imaginary parts of the different unknowns, and two separate real solutions with each n unknowns will produce exactly the same result.

The full and special cases are all catered for in the aips++ routines.

2.2.2 Errors

Using arguments comparable to those in (2.1.1), we can get (13) for the complex case as:

χ2 = [ll] k=0n1 x ka2kl + xka2k+1l (30)

with the a’s and x’s as defined in (29).

3 Dependant linear equations

If there are not enough independent condition equations, the normal matrix A cannot be inverted, and a call to WNMLTN will fail with a .false. return value.

The equations could still be solved if some additional ‘constraint’ equations would be introduced. In the more complex cases the precise, let alone the best, form for these additional equations is difficult to determine (e.g. the redundancy situation in Westerbork).

A method known as ‘Singular value decomposition’ (SVD) can be used to obtain the minimal set of orthogonal equations that have to be added to solve the LSQ problem. Several implementations exist in the literature.

In general we can distinguish three types of constrained equations:

All three cases are handled in the LSQ package, the first two in the same way.

The general constraint situation arises from the use of Lagrange multiplicators. Assume that in addition to the condition equations, with measured values, we have a set of p rigorous equations:

ϕi(x) = 0 i = 0,,p 1 (31)

We must therefore make χ2 minimal, subject to the set of (31), or:

i=0n1χ2 xi dxi = 0 (32)

subject to the conditions:

i=0n1ϕk xi dxi = 0 k = 0,,p 1 (33)

which leads to a set of n + p equations:

χ2 xi + k=0p1λ kϕk xi = 0 i = 0,,n 1 (34)

together with the (31).

Note that I have chosen for having constraint equations linear in the unknowns, with a zero value. In cases where this is not adequate (e.g. x + y + z = 360) a simple linear transformation will suffice to make it e.g. x + y + z = 0.

Defining the second term in (34) as Bik, we can write our expanded set of normal equations as:

ABB T 0 x λ = L 0 (35)

3.1 Unknown constraints

The routines to handle this situation use a method based on “Pseudo-inverse berekening en Cholesky factorisatie”, Hans van der Marel, 27 maart 1990, Faculty of Geodesy, Delft University. I will briefly describe Hans’ paper, together with the additions I have made.

In the case we are considering, the normal equation A has not full column rank. Therefore, there exists, if the rank defect is p, an n × p matrix G, a basis for the null-space of A, with AG = 0. If we assume that B has just sufficient constraint equations to solve the rank defect, the inverse of the matrix in (35) is:

ABBT0 1 = A# GBT G1 GT B1GT 0 (36)

with A# the pseudo-inverse of A. A number of relations hold for A#, the most important for us that BT A# = 0 and A#B = 0, or B is a base for the null-space of A#. For a mininorm solution we can choose B = G. In that case A# is the Moore-Penrose inverse, and BT G is regular and symmetric.

We can now proceed in the following way:

  1. Do a Cholesky factorisation of the normal equations A until the remaining columns of A are dependent. Pivoting along the diagonal of A occurs to make sure that A can be partitioned in an independent and a dependent part with rank defect n2. Dependency is determined by looking at the angle between a column and the space formed by the already determined independent space (the so-called ‘collinearity number’).

    If n1 = n n2, we can say that after n1 Choleski steps, we have:

    A = A11A12 A21 A22 = U11T 0 U12T I U11U12 0 A¯22 (37)

    with U11 the Cholesky factor of A11, U12 = U11T 1A12 and A¯22 = A22 U12T U12.

    The collinearity angle δ can be determined for the current column i by: sin2δ = uii2aii.

  2. Determine a G1 replacing the (rectangular) U12 from U11G1 = U12.
  3. Determine a (symmetric) G2 replacing A22 from the rank basis I + G1T G1 = (G2T 1)(G2 1)
  4. determine constraint equations G = G1G2,G2T .
  5. Solve for the n1 independent unknowns x1 using A11 and L1 by back substitution
  6. Solve constraint equations for the remaining n2 unknowns, using G2 by back substitution: x2 = G21G1T x1
  7. Make Baarda’s S-transform of independent n1 unknowns: x1 = x1 + G1x2.

Setting F = A11L1, D = G21G1T and E = In1 + G1D D , we can write the above steps as:

x1 x2 = In1 + G1D D A11L1 = EF (38)

3.1.1 Errors

Errors are determined in the same way as described in (2.1.1), where it should be noted that (13) can either be used summing over n1 variables and using the first guess of x1, or over n and using the final x. Internally the first option is used. The uncertainties in x are determined by calculating the covariance matrix. Using similar arithmetic as in (2.1.1), it can be shown that:

A1 = E a 11 ET (39)

A1 is calculated by first solving H = E A11 In1 and then A1 = E HT .

3.2 Known constraints

In the case of known constraints, hence when B is known in (35), this equation can be solved. Cholesky decomposition does not work in this case, and a, transparent, Crout LU decomposition is done to determine the (symmetric) covariance (i.e. inverse) matrix of the left hand side of (35). .

3.2.1 Errors

Errors are determined as explained in (2.1.1). Since all constraint equations are 0, the sum in (13) is taken over n, not over n + p if p are the number of constraint equations.

4 Non-linear equations

In the case of non-linear condition equations, no simple solution to minimise the merit function (e.g. (6)) exists. A simple solution is to take the condition equation, make a first order Taylor expansion around an estimated x̂; solve for dx, and iterate till solution found.

However, better and more stable methods exist (e.g. steepest-descend) for some circumstances. A method that seems to be quite stable and using both steepest-descent and Taylor expansion, is the method by Levenberg-Marquardt (see e.g. “Numerical recipes”, W.H. Press et al., Cambridge University Press).

If we have an estimate for x, we can find a better one by:

x̂next = x̂ + H1 χ2(x̂) (40)

where H is the Hessian matrix of χ2. If our approximation is not good enough, we could use the steepest-descent by calculating:

x̂next = x̂ constant χ2(x̂) (41)

The Hessian matrix H has as elements:

Hij = 2χ2 xixj = 2 k=0N1 1 σi2 yk xi yk xj li yi 2yi xixj (42)

By dropping the second term in (42), and multiplying each Hii term with (1 + λ), and defining the first derivative of χ2 as b, we can combine the equations (40), (41) into:

H dx = b (43)

which can be solved as standard normal equations.

Choosing λ is the crux of the matter, and where to stop iterating. A start value of λ = 0.001 is used in the following routines. The method looks at the χ2 for x̂ + dx. If it has increased over the value of χ2 for x̂, x̂ is unchanged, and λ = 10λ. If there is a decrease; a new value for x̂ is used, and λ = λ10. Iteration can be stopped if the fractional decrease in χ2 is small (say < 0.001); never if there is an increase in χ2.

4.1 Errors

Errors are calculated as described in (2.1.1). The errors returned by WNMLSN are, of course, only approximations, since the original equations are non-linear, but give a good impression of the residuals. The covariance matrix should be calculated by doing a final linear run on the residuals, and solve the equations.

5 Summary

An LSQ-object takes a total space of:

9 + (administration) (n + p)(n + p + 1)2 + (normalarray) 4m + (errorcalculations) m(m + p) + (knownsides) (n + p + 1)2 + (pivotarea) (n + p) (solution)8bytewords (44)

where n is the number of unknowns (2n if complex); m the number of simultaneously to be solved equations; p the number of external constraint equations. In the non-linear case two LSQ-objects are used. In cases where the inverted normal array is calculated (known constraints) a temporary storage of n2 8-byte words is used.

The following calls are available:

  1. To be given

wnb, August 5, 2015