The MEASUREMENT EQUATION of a generic radio telescope AIPS++ Implementation Note nr 185

The MEASUREMENT EQUATION
of a generic radio telescope
AIPS++ Implementation Note nr 185

J.E.Noordam
(jnoordam@nfra.nl)

15 February 1996, version 2.0

File: /aips++/nfra/185.latex Symbols File: /aips++/nfra/megi-symbols.tex

Abstract: This note is a step towards an ‘oﬃcial’ AIPS++ description of the Measurement Equation, based on an agreed set of names and conventions. The latter have been deﬁned in a separate TeX ﬁle, and can (should) be used in subsequent AIPS++ documents to ensure consistency.

1 INTRODUCTION
2 THE M.E. FOR A SINGLE POINT SOURCE
2.1 The feed-based instrumental Jones matrices
2.2 The Jones matrix of a Tied Array feed
2.3 Jones matrices for multiple beams
3 THE FULL MEASUREMENT EQUATION
3.1 Summing and averaging
3.2 interferometer-based eﬀects
4 POLARISATION COORDINATES
5 GENERIC FORM OF JONES MATRICES
5.1 Ionospheric Faraday rotation (

F_{i} (⃗ ρ, {⃗ r}_{i})

)
5.2 Atmospheric gain (

T_{i} (⃗ ρ, {⃗ r}_{i})

)
5.3 Fourier Transform kernel (

K_{i} ({⃗ r}_{i} . ⃗ ρ)

)
5.4 Projection matrix (

P_{i}

) if

γ_{x a} = γ_{y b}

5.5 Projection matrix (

P_{i}

) if

γ_{x a} ⁄ = γ_{y b}

5.6 Voltage primary beam (

E_{i} (⃗ ρ)

)
5.7 Position-independent receptor cross-leakage (

D_{i}

)
5.8 Commutation (

Y_{i}

)
5.9 Hybrid (

H_{i}

)
5.10 Electronic gain (

G_{i}

)
5.11 Do we need a conﬁguration matrix (

C_{i}

)?
6 THE ORDER OF JONES MATRICES
6.1 Overview of commutation properties
6.2 Overview of Jones matrix forms
6.3 Allowable changes of order
6.4 VisJones and SkyJones
6.4.1 Tied Array
A APPENDIX: CONVENTIONS
A.1 Some deﬁnitions
A.2 Labels, sub- and super-scripts
A.3 Coordinate frames
A.4 Matrices and vectors
A.5 Miscellaneous parameters

1 INTRODUCTION

The matrix-based Measurement Equation (ME) of a Generic Radio Telescope was developed by Hamaker, Bregman and Sault [2] [3], based on earlier work by Bregman [1]. After discussion by Noordam [5] and Cornwell [6] [7] [8] [9] [10] [11], the M.E. has been adopted as the generic foundation of the uv-data calibration and imaging part of AIPS++. In the not too distant future, an ‘oﬃcial’ AIPS++ description of the ME will be needed, with agreed conventions and nomenclature (see Appendix A). This note is a step towards that goal.

The heart of the M.E. is formed by the $2 \times 2$ feed-based ‘Jones’ matrices, which describe the eﬀects of various parts of the observing instrument on the signal. The main section of this document is devoted to describing the basic form of the Jones matrices in linear and circular polarisation coordinates. Another section discusses the conditions under which their order may be modiﬁed (matrices do not always commute).

It is expected that the details of the M.E. (and of this note) will be reﬁned during the ﬁrst few iterations of design and implementation of AIPS++. But the structure of the M.E. formalism as presented here appears to be rich enough to accomodate all existing and planned radio telescopes. This includes ‘exotic’ ones like cylindrical mirrors, phased arrays, and interferometer arrays with very dissimilar antennas. Further reﬁnements should only require the addition of new Jones matrices, or devising new expressions for existing matrix elements.

In order to test this bold assertion, the various institutes might endeavour to model their own telescopes in terms of the precise and common language of the M.E., using this note as a reference. The following ‘rules’ are probably good ones:

In modelling an instrument, stay as close to the actual physical situation as possible. Violations of this principle, for whatever reasons, will lead to problems sooner or later.
It is counterproductive to try and simplify the M.E. to make it ‘look more tractable’. This practice introduces hidden assumptions, which tend to be forgotten by the programmer, and unknown to the user.
Use the suggested nomenclature and conventions.

It is also good to realise that there are two basic forms of ME, which should not be confused: In the physical form, each instrumental eﬀect is modelled separately by its own matrix. This is useful for simulation purposes. In the mathematical form, eﬀects are ‘lumped together’ if they cannot be solved for separately. Example: the various contributions to the receiver gain, and tropospheric gain.

Acknowledgements: The author has greatly beneﬁted from detailed discussions with Jayaram Chengalur, Jaap Bregman, Johan Hamaker, Tim Cornwell, Wim Brouw and Mark Wieringa.

2 THE M.E. FOR A SINGLE POINT SOURCE

For the moment, it will be assumed that there is a single point source at an arbitrary position (direction) $\vec{ρ} = \vec{ρ} (l, m)$ w.r.t. the fringe-tracking centre, and that observing bandwidth and integration time are negligible. Multiple and extended sources, and the eﬀects of non-zero bandwidth and integration time will be treated for the Full Measurement Equation in section 3.

For a given interferometer, the measured visibilities can be written as a 4-element ‘coherency vector’ $\vec{V} i j$ , which is related to the so-called ‘Stokes vector’ $\vec{I} (l, m)$ of the observed source by a matrix equation,

\vec{V} i j = (\begin{matrix} v i p j p \\ v i p j q \\ v i q j p \\ v i q j q \end{matrix}) = (J i \otimes J j *) S {(\begin{matrix} I \\ Q \\ U \\ V \end{matrix})}_{l, m}

(1)

The subscripts $i$ and $j$ are the labels of the two feeds that make up the interferometer. The subscripts $p$ and $q$ are the labels of the two output IF-channels from each feed.¹

The ‘Stokes matrix’ $S$ is a constant $4 \times 4$ coordinate transformation matrix. It is discussed in detail in section 4 below. The real heart of the M.E. is the ‘direct matrix product’ $J i \otimes J j *$ of two $2 \times 2$ feed-based Jones matrices.

The ‘Stokes-to-Stokes’ transmission of a Stokes vector through an ‘optical’ element may be described by multiplication with a $4 \times 4$ Mueller matrix $ℳ i j$ [2] [3]. Using equation 1:

{\vec{I}}^{o u t} (l, m) = S^{- 1} \vec{V} i j = S^{- 1} (J i \otimes J j *) S {\vec{I}}^{i n} (l, m) = ℳ i j (l, m) {\vec{I}}^{i n} (l, m)

(2)

Mueller matrices are useful in simulation, when studying the eﬀect of instrumental eﬀects on a test source $\vec{I} (l, m)$ . They can be easily generalised to the full M.E. (see section 3).

2.1 The feed-based instrumental Jones matrices

It will be assumed (for the moment) that all instrumental eﬀects can be factored into feed-based contributions, i.e. any interferometer-based eﬀects are assumed to be negligible (see section 3). The $4 \times 4$ interferometer response matrix $J i j$ then consists of a ‘direct matrix product’² $J i \otimes J j *$ of two $2 \times 2$ feed-based response matrices, called ‘Jones matrices’. The reader will note that this factoring is the polarimetric generalisation of the familiar ‘Selfcal assumption’, in which the (scalar) gains are assumed to be feed-based rather than interferometer-based.

The $2 \times 2$ Jones matrix $J i$ for feed $i$ can be decomposed into a product of several $2 \times 2$ Jones matrices, each of which models a speciﬁc feed-based instrumental eﬀect in the signal path:

J i = G i [H i] [Y i] B i K i T i F i = G i [H i] [Y i] (D i E i P i) K i T i F i

(3)

in which

F i (\vec{ρ}, \vec{r} i)

ionospheric Faraday rotation

T i (\vec{ρ}, \vec{r} i)

atmospheric complex gain

K i (\vec{ρ} . \vec{r} i)

factored Fourier Transform kernel

P i

projected receptor orientation(s) w.r.t. the sky

E i (\vec{ρ})

voltage primary beam

D i

position-independent receptor cross-leakage

[Y i]

commutation of IF-channels

[H i]

hybrid (conversion to circular polarisation coordinates)

G i

electronic complex gain (feed-based contributions only)

Matrices between brackets ([ ]) are not present in all systems. $B i$ is the ‘Total Voltage Pattern’ of an arbitrary feed, which is usually split up into three sub-matrices: $D i E i P i$ . Jones matrices that model ‘image-plane’ eﬀects depend on the source position (direction) $\vec{ρ}$ . Some also depend on the antenna position $\vec{r} i$ . Of course most of them depend on time and frequency as well. The various Jones matrices are treated in some detail in section 5.

Since the Jones matrices do not always commute with each other, their order is important. In principle, they should be placed in the ‘physical’ order, i.e. the order in which the signal is aﬀected by them while traversing the instrument. In practice, this is not always possible or desirable. Section 6 discusses the implications of choosing a diﬀerent order.

2.2 The Jones matrix of a Tied Array feed

The output signals from the two IF-channels of a ‘tied array’ is the weighted sum of the IF-channel signals from $n$ individual feeds. A tied array is itself a feed (see deﬁnition in appendix A), modelled by its own Jones matrix. For a single point source, we get:

J i^{t i e d a r r a y} = Q i \sum_{n} w i n J i n

(4)

and for an interferometer between two tied arrays $i$ and $j$ with $n$ and $m$ constituent feeds respectively:

J i j = (J i \otimes J j *) = (Q i \otimes Q j *) \sum_{n} \sum_{m} w i n w j m (J i n \otimes J j m *)

(5)

See also section 6.4. The matrix $Q i$ models electronic gain eﬀects on the added signal of the tied array feed $i$ . The $Q i$ can be solved by the usual Selfcal methods, in contrast to instrumental errors in the constituent feeds before adding. The latter will often cause decorrellation, and thus closure errors in an interferometer.

Since a tied array feed can be modelled by a Jones matrix, it can be combined with any other type of feed to form an interferometer. Examples are the use of WSRT and VLA as tied arrays in VLBI arrays. Note that this is made possible by factoring the Fourier Transform kernel $K i j (\vec{u} i j . \vec{ρ})$ into $K i (\vec{r} i . \vec{ρ})$ and $K j (\vec{r} j . \vec{ρ})$ , and including the latter in the Jones matrices of the individual feeds (see equ 28).

Obviously, the primary beam of a tied array can be rather complicated, but it is fully modelled by equ 4. Moreover, the contributing feeds in a tied array are allowed to be quite dissimilar. It is nor even necessary for their receptors (dipoles) to be aligned with each other! Thus, equation 4 can also be used to model ‘diﬃcult’ telescopes like Ooty or MOST, or an element of the future Square Km Array (SKAI). This puts the crown on the remarkable power of the Measurement Equation.

2.3 Jones matrices for multiple beams

Using the deﬁnition in appendix A, each beam in a multiple beam system should be treated like a separate logical feed, modelled by its own Jones matrix. Any communality between them can be modelled in the form of shared parameters in the expressions for the various matrix elements.

3 THE FULL MEASUREMENT EQUATION

3.1 Summing and averaging

For $k$ ‘real’ incoherent sources, observed with a ‘real’ telescope, equ 1 becomes:

\vec{V} i j = \frac{1}{Δ t Δ f} \int d t \int d f \sum_{k} \frac{1}{Δ l Δ m} \int d l d m J i \otimes J j * S \vec{I} (l, m)

(6)

The visibility vector $\vec{V} i j$ is integrated over the extent of the sources ( $\int d l d m$ ), over the integration time ( $\int d t$ ) and over the channel bandwidth ( $\int d f$ ). Integration over the aperture ( $\int d u d v$ ) is taken care of by the primary beam properties.

There are only four integration coordinates, whose units are determined by the ﬂux density units in which $\vec{I}$ is expressed: $e n e r g y ∕ s e c ∕ H z ∕ b e a m$ . These coordinates deﬁne a 4-dimensional ‘integration cell’. If the variation of $\vec{V} (f, t, l, m)$ is linear over this cell, integration is not necessary:

\vec{V} i j = \sum_{k} {\vec{V}}_{0 k} (f_{0}, t_{0}, l_{0}, m_{0})

(7)

in which ${\vec{V}}_{0 k}$ is the value for source $k$ at the centre of the cell, for $Δ f = 1 H z$ and $Δ t = 1 s e c$ . If the variation of $\vec{V} (f, t, l, m)$ over the cell can be approximated by a polynomial of order $\leq 3$ , then it is suﬃcient to calculate only the 2nd derivative(s) at the centre of the cell:

{\vec{V}}^{i n t} = \sum_{k} {\vec{V}}_{0 k} + \frac{1}{12} (\frac{\partial 2 {\vec{V}}_{0 k}}{\partial f 2} {(Δ f)}^{2} + \frac{\partial 2 {\vec{V}}_{0 k}}{\partial t 2} {(Δ t)}^{2} + \frac{\partial 2 {\vec{V}}_{0 k}}{\partial l 2} {(Δ l)}^{2} + \frac{\partial 2 {\vec{V}}_{0 k}}{\partial m 2} {(Δ m)}^{2})

(8)

Here it is assumed that the 2nd derivatives are be constant over the cell, i.e. the cross-derivatives $\frac{\partial {\vec{V}}_{0}}{\partial p_{1} \partial p_{2}}$ are zero.

3.2 interferometer-based eﬀects

Until now, we have assumed that all instrumental eﬀects could be factored into feed-based contributions, i.e. we have ignored any interferometer-based eﬀects. This is justiﬁed for a well-designed system, provided that the signal-to-noise ratio is large enough (thermal noise causes interferometer-based errors, albeit with a an average of zero). However, if systematic errors do occur, they can be modelled:

\vec{V} {i j}^{^{'}} = X i j (\vec{A} i j + M i j \vec{V} i j)

(9)

The $4 \times 4$ diagonal matrix $X$ , the ‘Correlator matrix’, represents interferometer-based corrections that are applied to the uv-data in software by the on-line system. Examples are the Van Vleck correction. In the newest correlators, it approaches a constant ( $x$ ).

X i j = (\begin{matrix} x i p j p & 0 & 0 & 0 \\ 0 & x i p j q & 0 & 0 \\ 0 & 0 & x i q j p & 0 \\ 0 & 0 & 0 & x i q j q \end{matrix}) \approx x 𝒰

(10)

The $4 \times 4$ diagonal matrix $M$ represents multiplicative interferometer-based eﬀects.

M i j = (\begin{matrix} m i p j p & 0 & 0 & 0 \\ 0 & m i p j q & 0 & 0 \\ 0 & 0 & m i q j p & 0 \\ 0 & 0 & 0 & m i q j q \end{matrix}) \approx 𝒰

(11)

The 4-element vector $\vec{A} i j$ represents additive interferometer-based eﬀects. Examples are receiver noise, and correlator oﬀsets.

\vec{A} i j = (\begin{matrix} a i p j p \\ a i p j q \\ a i q j p \\ a i q j q \end{matrix}) \approx \vec{0}

(12)

In some cases, interferometer-based eﬀects can be calibrated, e.g. when they appear to be constant in time. It will be interesting to see how many of them will disappear as a result of better modelling with the Measurement Equation. In any case, it is desirable that the cause of interferometer-based eﬀects is properly understood (simulation!).

4 POLARISATION COORDINATES

In the $2 \times 2$ signal domain, the electric ﬁeld vector $\vec{E}$ of the incident plane wave can be represented either in a linear polarisation coordinate frame $(x, y)$ or a circular polarisation coordinate frame $(r, l)$ . Jones matrices are linear operators in the chosen frame:

\vec{V} + i = (\begin{matrix} v i p \\ v i q \end{matrix}) = J + i (\begin{matrix} e x \\ e y \end{matrix}) o r \vec{V} ⊙ i = J ⊙ i (\begin{matrix} e r \\ e l \end{matrix})

(13)

For linear polarisation coordinates, equation 1 becomes:

\vec{V} + i j = (J + i \otimes J + j *) (\vec{E} \otimes \vec{E} *) = (J + i \otimes J + j *) (\begin{matrix} e x e x^{*} \\ e x e y^{*} \\ e y e x^{*} \\ e y e y^{*} \end{matrix}) = (J + i \otimes J + j *) S + \vec{I} (l, m)

(14)

and there is a similar expression for circular polarisation coordinates. Thus, as emphasised in [2], the Stokes vector $\vec{I} (l, m)$ and the coherency vector $\vec{V} i j$ represent the same physical quantity, but in diﬀerent abstract coordinate frames. A ‘Stokes matrix’ $S$ is a coordinate transformation matrix in the $4 \times 4$ coherency domain: $S +$ transforms the representation from Stokes coordinates (I,Q,U,V) to linear polarisation coordinates ( $x x, x y, y x, y y$ ). Similarly, $S ⊙$ transforms to circular polarisation coordinates ( $r r, r l, l r, l l$ ). Following the convention of [4], we write:³

S + = \frac{1}{2} (\begin{matrix} 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & i \\ 0 & 0 & 1 & - i \\ 1 & - 1 & 0 & 0 \end{matrix}) S ⊙ = \frac{1}{2} (\begin{matrix} 1 & 0 & 0 & 1 \\ 0 & 1 & i & 0 \\ 0 & 1 & - i & 0 \\ 1 & 0 & 0 & - 1 \end{matrix})

(15)

$S$ -matrices are almost unitary, i.e. except for a normalising constant: ${(S)}^{- 1} = 2 {(S)}^{* T}$ . $S$ cannot be factored into feed-based parts. The two Stokes matrices are related by:

S ⊙ = (ℋ \otimes ℋ *) S + S + = (ℋ - 1 \otimes (ℋ - 1) *) S ⊙

(16)

with⁴

ℋ = \frac{1}{\sqrt{2}} (\begin{matrix} 1 & i \\ 1 & - i \end{matrix}) ℋ - 1 = \frac{1}{\sqrt{2}} (\begin{matrix} 1 & 1 \\ - i & i \end{matrix})

(18)

Most Jones matrices will have the same form in both polarisation coordinate frames. But if a Jones matrix is expressed in terms of parameters that are deﬁned in one of the two frames, it will have two diﬀerent but related forms. This is the case for Faraday rotation $F i$ , receptor orientation $P i$ , and receptor cross-leakage $D i$ , in which the orientation w.r.t. the $x, c c Y$ frame plays a role. The two forms of a Jones matrix $A$ can be converted into each other by the coordinate transformation matrix $ℋ$ and its inverse:

A ⊙ = ℋ A + ℋ - 1 A + = ℋ - 1 A ⊙ ℋ

(19)

The conversion may be done by hand, using (the elements $a, b, c, d$ may be complex):

ℋ (\begin{matrix} a & c \\ d & b \end{matrix}) ℋ - 1 = 0.5 (\begin{matrix} (a + b) - i (c - d) & (a - b) + i (c + d) \\ (a - b) - i (c + d) & (a + b) + i (c - d) \end{matrix})

(20)

ℋ - 1 (\begin{matrix} a & c \\ d & b \end{matrix}) ℋ = 0.5 (\begin{matrix} (a + b + c + d) & i (a - b - c + d) \\ - i (a - b + c - d) & (a + b - c - d) \end{matrix})

(21)

Applying these general expressions to rotation $R o t (α)$ and ellipticity $E l l (α, - α)$ matrices (see Appendix for their deﬁnition), the conversions are:

\begin{array}{rcl} ℋ R o t (α) ℋ - 1 & = & D i a g ({exp}^{i α}, {exp}^{- i α}) \\ ℋ R o t (α, β) ℋ - 1 & = & s e e e q u a t i o n 34 \\ ℋ E l l (α, - α) ℋ - 1 & = & R o t (α) & (22) \end{array}

\begin{array}{rcl} ℋ - 1 R o t (α) ℋ & = & E l l (α, - α) \\ ℋ - 1 E l l (α, - α) ℋ & = & D i a g ({exp}^{i α}, {exp}^{- i α}) & (23) \end{array}

Usually, all matrices in a ‘Jones chain’ will be deﬁned in the same coordinate frame. An exception is the case where linear dipole receptors are used in conjunction with a ‘hybrid’ $H i$ to create pseudo-circular receptors:

\begin{array}{rcl} J i & = & G ⊙ i H i D + i E + i P + i K + i T + i F + i (u s i n g S = S +) \\ = & G ⊙ i (H i D + i ℋ - 1) ℋ E + i P + i K + i T + i F + i (u s i n g S = S +) \\ = & G ⊙ i D ⊙ i ℋ E + i P + i K + i T + i F + i (u s i n g S = S +) \\ = & G ⊙ i D ⊙ i E ⊙ i P ⊙ i K ⊙ i T ⊙ i F ⊙ i ℋ (u s i n g S = S +) \\ = & G ⊙ i D ⊙ i E ⊙ i P ⊙ i K ⊙ i T ⊙ i F ⊙ i (u s i n g S = S ⊙) & (24) \end{array}

in which $H i$ represents an electronic implementation of the coordinate transformation matrix $ℋ$ . All these expressions are equivalent in the sense that, in conjunction with the indicated Stokes matrix, they produce a coherency vector in circular polarisation coordinates. The choice of which expression to use depends on whether one wishes to model the feed explicitly in terms of its physical (dipole) properties, or whether one wishes to regard is as a ‘black box’ circular feed with unknown internal structure.

5 GENERIC FORM OF JONES MATRICES

In this section, the ‘generic’ form of various $2 \times 2$ feed-based instrumental Jones matrices (operators) will be treated in some detail.

It will be noted that for each matrix, the 4 elements have been given an ‘oﬃcial’ name (e.g. $f i x x$ ). The (possibly naive) idea is that, if the structure of the Measurement Equation is more or less complete, these ‘standard’ matrix elements could be referred to explicitly by their oﬃcial names in other AIPS++ documents (and code), for instance to replace them with speciﬁc expressions for particular telescopes or purposes.

The subscript convention is as follows: $y i b p$ is an element of matrix $Y$ for feed $i$ , which models the ‘coupling factor’ for the signal going from receptor $b$ to IF-channel $p$ . Where possible, the expressions have been reduced to matrices like the diagonal matrix ( $D i a g$ ), rotation matrix ( $R o t$ ) etc. These are deﬁned in the Appendix.

5.1 Ionospheric Faraday rotation ( $F i (\vec{ρ}, \vec{r} i)$ )

The matrix $F + i$ represents (ionospheric) Faraday rotation of the electric vector over an angle $χ i$ w.r.t. the celestial $x, y$ -frame. Since $χ i$ is deﬁned in one of the polarisation coordinate frames, there will be two diﬀerent forms for $F i$ (see also section 4). For linear polarisation coordinates:

F + i (\vec{ρ}, \vec{r} i) = (\begin{matrix} f i x x & f i y x \\ f i x y & f i y y \end{matrix}) = R o t (χ i)

(25)

In circular polarisation coordinates, the matrix $F ⊙ i$ is a diagonal matrix which introduces a phase diﬀerence, or rather a delay diﬀerence. It expresses the fact that ionospheric Faraday rotation is caused by a (strongly frequency-dependent) diﬀerence in propagation velocity between right-hand and left-hand circularly polarised signals when travelling through a charged medium like the ionosphere. In terms of the Faraday rotation angle $χ i$ (see above), we get:

F ⊙ i (\vec{ρ}, \vec{r} i) = (\begin{matrix} f i r r & f i l r \\ f i r l & f i l l \end{matrix}) = ℋ F + i ℋ - 1 = D i a g ({exp}^{i χ i}, {exp}^{- i χ i})

(26)

In principle, the Faraday rotation angle is a function of source direction and feed position: $χ i = χ i (\vec{ρ}, \vec{r} i$ ). However, Faraday rotation is a large-scale eﬀect, so it will usually have the same value for all sources in the primary beam: $χ i = χ (\vec{r} i)$ . For arrays smaller than a few km, the rotation angle will usually also be the same for all feeds: $χ i = χ (t)$ . These assumptions reduce the number of independent parameters considerably.

5.2 Atmospheric gain ( $T i (\vec{ρ}, \vec{r} i)$ )

The matrix $T + i$ represents complex atmospheric gain: refraction, extinction and perhaps non-isoplanaticity. Since $T + i$ does not depend on a polarisation coordinate frame, there is only one form:

T + i = T ⊙ i = T i (\vec{ρ}, \vec{r} i) \approx (\begin{matrix} t i & 0 \\ 0 & t i \end{matrix}) = M u l t (t i)

(27)

The matrix is diagonal because the atmosphere does is not supposed to cause cross-talk. The diagonal elements are assumed to be equal, because the atmosphere is not supposed to aﬀect polarisation.

Atmospheric eﬀects in the ‘pupil-plane’ (i.e. originating directly above the feeds) can be modelled with a complex gain. It is less clear how to deal with eﬀects that originate higher up in the atmosphere, i.e. between pupil plane and image plane.

A phase screen over the array can be modelled as $t i = {exp}^{i ψ i}$ in which the phase is assumed to be a low-order 2D polynomial as a function of the feed position $\vec{r}$ : $ψ i = a_{0} (t) + a_{1} (t) \vec{r} i + a_{2} (t) \vec{r} i^{2} + \dots$

5.3 Fourier Transform kernel ( $K i (\vec{r} i . \vec{ρ})$ )

The matrix $K i$ represents the Fourier Transform kernel, which can also be seen as a phase weight factor). It is factored into feed-based parts in order to be able to model a tied array (see section 2.2). Since $K i$ does not depend on the polarisation coordinate frame, there is only one form:

K + i = K ⊙ i = K i (\vec{r} i . \vec{ρ}) = (\begin{matrix} k i a a & 0 \\ 0 & k i b b \end{matrix}) = (\begin{matrix} k i & 0 \\ 0 & k i \end{matrix}) = M u l t (k i)

(28)

in which $k i = \frac{1}{\sqrt{n}} {exp}^{i 2 π \vec{r} i . \vec{ρ} ∕ λ}$ , which depends on the projected feed position $\vec{r} i$ and the source direction $\vec{ρ} = \vec{ρ} (l, m)$ w.r.t. the fringe tracking centre $\vec{ρ} f t c$ , and $n = \sqrt{1 - l^{2} - m^{2}} \approx 1 - 0.5 (l^{2} + m^{2})$ .

If $k i a a = k i b b$ , the interferometer matrix $K i j = (K i \otimes K j *)$ is a $4 \times 4$ diagonal matrix with equal elements. This is equivalent to a multiplicative factor of the familiar form $k i j = k i k j^{*} = \frac{1}{n} {exp}^{i 2 π (\vec{r} i - \vec{r} j) . \vec{ρ} ∕ λ} = \frac{1}{n} {exp}^{i \vec{u} i j . \vec{ρ}}$ , i.e. the Fourier Transform kernel or ‘phase weight’ for the baseline $\vec{u} i j$ . For small ﬁelds, $n \approx 1$ , so $\vec{u} i j . \vec{ρ} = (u l + v m + w (n - 1)) \approx (u l + v m)$ becomes a 2D FT.

The receptors of a feed are practically always co-located, i.e. they have the same phase-centre: $\vec{r} i a = \vec{r} i b = \vec{r} i$ , so $k i a a = k i b b = k i$ . But note that it is possible to model a receptors that are not co-located, i.e. $\vec{r} i a \neq \vec{r} i b$ . It is not immediately obvious why one would want to do such a thing, but it is good to know that the formalism allows it.

5.4 Projection matrix ( $P i$ ) if $γ x a = γ y b$

The ‘Projection matrix’ models the projected orientation of the receptors w.r.t. the electrical $x, y$ frame on the sky, as seen from the direction of the source (see also section 5.6 below). Since the orientations are deﬁned in one of the polarisation coordinate frames, there will be two diﬀerent forms for $P i$ (see section 4). For linear polarisation coordinates:

P + i = (\begin{matrix} p i x a & p i y a \\ p i x b & p i y b \end{matrix}) \equiv (\begin{matrix} cos γ x a & - sin γ x a \\ sin γ x a & cos γ x a \end{matrix}) = R o t (γ x a)

(29)

in which $γ x a$ is the projected angle between the positive $x$ -axis and the orientation of receptor $a$ (see also Appendix ??). There is an implicit assumption here that the feed has perpendicular receptors and is fully steerable, which is the case for the majority of existing telescopes. See the next section for the case where the projected orientations are not perpendicular ( $γ x a \neq γ y b$ ).

For circular polarisation coordinates:

P ⊙ i = (\begin{matrix} p i r a & p i l a \\ p i r b & p i l b \end{matrix}) = ℋ P + i ℋ - 1 = D i a g ({exp}^{i γ x a}, {exp}^{- i γ x a})

(30)

It is sometimes useful to introduce an intermediate coordinate frame, attached to the feed $i$ . In that case: $γ x a = γ x i + γ i a = β + γ i a$ . The ‘oﬀset’ angle $γ i a$ between receptor $a$ and the frame of feed $i$ will be zero in most cases. The angle $β$ is the parallactic angle, i.e. the angle between two great circles through the source, and through the celestial North Pole and the local zenith respectively. This parallactic angle is zero for an equatorial feed, and varies smoothly with $H A (t)$ for an alt-az feed:

\begin{array}{rcl} sin β & = & cos L A T sin H A \\ cos β & = & cos D E C sin L A T - sin D E C cos L A T cos H A & (31) \end{array}

5.5 Projection matrix ( $P i$ ) if $γ x a \neq γ y b$

The M.E. formalism must also be able to deal with more ‘exotic’ antennas like parabolic cylinders (Arecibo, MOST) or horizontal dipole arrays (SKAI). In those cases, the projected angles of the two receptors will generally not be equal, i.e. $γ x a \neq γ y b$ .

NB: The angle $γ y b$ of receptor $b$ is deﬁned w.r.t. the $y$ -axis rather than the $x$ -axis. This ensures that $γ y b = γ x a$ , so that matrix $P + i$ reduces to a simple rotation $R o t (γ x a)$ , in the common case described in section 5.4 above.

For linear polarisation coordinates $P + i$ becomes a ‘pseudo-rotation’ (compare with equ 29 above):

P + i = (\begin{matrix} p i x a & p i y a \\ p i x b & p i y b \end{matrix}) \equiv (\begin{matrix} cos γ x a & - sin γ x a \\ sin γ y b & cos γ y b \end{matrix}) = R o t (γ x a, γ y b)

(32)

For circular polarisation coordinates:

\begin{array}{rcl} P ⊙ i & = & (\begin{matrix} p i r a & p i l a \\ p i r b & p i l b \end{matrix}) = ℋ P + i ℋ - 1 & (33) \\ = & 0.5 (\begin{matrix} cos γ x a + cos γ y b + i (sin γ x a + sin γ y b) & cos γ x a - cos γ y b - i (sin γ x a - sin γ y b) \\ cos γ x a - cos γ y b + i (sin γ x a - sin γ y b) & cos γ x a + cos γ y b - i (sin γ x a + sin γ y b) \end{matrix}) \end{array}

The future large radio telescopes may have feeds in the form of dipole arrays, possibly tilted over an angle $α$ towards the South w.r.t. the local horizontal plane. In that case, the projected angle $γ x a$ between a North-South (NS) dipole and the $x$ -axis diﬀers from the projected angle $γ y b$ between an East-West (EW) dipole and the $y$ -axis (I hope this is correct now):

\begin{array}{rcl} c o s γ x a & = & cos H A sin D E C cos (L A T - α) - cos D E C sin (L A T - α) \\ s i n γ x a & = & - sin H A cos (L A T - α) \\ c o s γ y b & = & cos H A \\ s i n γ y b & = & - sin H A sin D E C & (34) \end{array}

5.6 Voltage primary beam ( $E i (\vec{ρ})$ )

The eﬀects of the primary beam are ignored by [2], which deals implicitly with on-axis sources observed by feeds with fully steerable parabolic mirrors. The AIPS++ M.E. must of course deal with the general case, including ‘exotic’ telescopes like Arecibo, MOST and SKAI. To this end, we deﬁne a total voltage pattern matrix $B i$ , which fully describes the conversion of the incident electric ﬁeld (V/m) into two voltages (V):

B + i (\vec{ρ}) = (\begin{matrix} b i x a & b i y a \\ b i x b & b i y b \end{matrix}) B ⊙ i (\vec{ρ}) = (\begin{matrix} b i r a & b i l a \\ b i r b & b i l b \end{matrix})

(35)

NB: Since the Jones matrix $J i$ is feed-based, it deals with voltage beams. The power beam for interferometer $i j$ is modelled by $B i \otimes B j *$ . Note that the formalism deals implicitly with interferometers between feeds with quite dissimilar primary beams.

In practice, it is often convenient to split the matrix $B i$ into a chain of sub-matrices:

It is always possible to split oﬀ a projection matrix $P i$ :
$B i = (B i P i^{- 1}) P i = E i^{^{'}} P i$
See sections 5.4 and 5.5 above.
It is always possible to split oﬀ a position-independent leakage matrix $D i$ :
$E i^{^{'}} P i = D i (D i^{- 1} E i^{^{'}}) P i = D i E i P i$
See section 5.7 below.

This is most useful in the common case of a fully steerable parabolic antenna. The voltage patterns of its feed(s) have a ﬁxed shape, which are rotated and translated w.r.t. the sky when pointing the antenna in diﬀerent directions. What remains after splitting oﬀ $P i$ and $D i (\vec{ρ})$ is an (approximately) real and diagonal matrix $E i$ which decsribes the position-dependent primary beam attenuation and the position-dependent leakage (see also equation 38 below):

E + i (\vec{ρ}) = E ⊙ i (\vec{ρ}) = E i (\vec{ρ}) = (\begin{matrix} e i a a & e i b a \\ e i a b & e i b b \end{matrix}) \approx D i a g (e i a a, e i b b)

(36)

As an example, the diagonal elements of $E + i$ for an idealised axially symmetric gaussian beam and dipole receptorswould look like:

\begin{array}{rcl} e i a a & = & exp - [{(\frac{l^{″} i a}{σ_{a} (1 + 𝜖_{a})})}^{2} + {(\frac{m^{″} i a}{σ_{a} (1 - 𝜖_{a})})}^{2}] \\ e i b b & = & exp - [{(\frac{l^{″} i b}{σ_{b} (1 + 𝜖_{b})})}^{2} + {(\frac{m^{″} i b}{σ_{b} (1 - 𝜖_{b})})}^{2}] \\ (37) \end{array}

Note that the two receptor beams are each described in their own coordinate frame $l^{″} i a, m^{″} i a$ and $l^{″} i b, m^{″} i b$ projected on the sky (see Appendix A). The projection matrix $P i$ only takes care of electrical rotation, but not of the rotation of the voltage beam on the sky!.

Equation 37 illustrates that the voltage beam of a dipole receptor will be slightly elongated in the direction of the dipole by a factor $(1 + 𝜖)$ , even if the mirror is perfectly circular and symmetrical. Obviously, the two asymmetric voltage beams of a feed will not coincide, because they are oriented diﬀerently. The resulting position-dependent diﬀerence is one cause of oﬀ-axis instrumental polarisation.

In reality, things will be more complicated, especially for oﬀ-axis sources. For instance, standing waves between the primary mirror and the frontend box, or scattering oﬀ support legs, may cause position-dependent leakage terms. Since these cannot be part of $D i$ , they must be modelled as oﬀ-diagonal elements of $E i$ itself.

In general, $E i$ will be more complicated for antennas with less symmetry. In some exotic cases, it may not be very useful to split oﬀ $D i$ or even $P i$ , although it is always allowed. In any case, the M.E. formalism oﬀers a framework for the ful description of the primary beam of any radio telescope that can be conceived.

5.7 Position-independent receptor cross-leakage ( $D i$ )

The oﬀ-diagonal elements $e i b a$ and $e i a b$ of $E i^{^{'}}$ describe ‘leakage’ between receptors, i.e. the extent to which each receptor is sensitive to the radiation that is supposed to be picked up by the other one.

It is customary to split oﬀ the position-independent part $e {i b a}^{^{'}}$ and $e {i a b}^{^{'}}$ of this leakage into a separate matrix $D i$ :

\begin{array}{rcl} E i^{^{'}} (\vec{ρ}) & = & (\begin{matrix} e i a a & e i b a + e^{^{'}} i b a \\ e i a b + e^{^{'}} i a b & e i b b \end{matrix}) \\ \approx & (\begin{matrix} 1 & e {i b a}^{^{'}} ∕ e i b b \\ e {i a b}^{^{'}} ∕ e i a a & 1 \end{matrix}) (\begin{matrix} e i a a & e i b a \\ e i a b & e i b b \end{matrix}) \\ = & (\begin{matrix} d i a a & d i b a \\ d i a b & d i b b \end{matrix}) (\begin{matrix} e i a a & e i b a \\ e i a b & e i b b \end{matrix}) = D i E i (\vec{ρ}) & (38) \end{array}

Usually, the position-dependent leakage coeﬃcients $e i b a$ and $e i a b$ are assumed to be zero, but that is not always justiﬁed.

If the leakage coeﬃcients are determined empirically by calibration, it is not necessary to know the details of the leakage mechanism. It is suﬃcient to solve for the elements of $D i$ . In that case, there is only one form:

D + i = D ⊙ i = D i = (\begin{matrix} d i a a & d i b a \\ d i a b & d i b b \end{matrix})

(39)

But in many cases, position-independent leakage can be physically explained by deviations $ϕ$ from the nominal receptor position angles (see $P i$ ), and by deviations $𝜃$ from nominal receptor ‘ellipticities’ $𝜃$ . For linear polarisation coordinates:

\begin{array}{rcl} D + i = (\begin{matrix} d i a a & d i b a \\ d i a b & d i b b \end{matrix}) & = & E l l (𝜃 i a, 𝜃 i b) R o t (ϕ i a, ϕ i b) \\ \approx & E l l (𝜃 i a, - 𝜃 i a) R o t (ϕ i a) & (40) \end{array}

The $\approx$ sign gives the approximation for a well-designed system. Often the two receptors are mounted in a single unit, so position angle deviations caused by mechanical bending of the feed structure are the same for both: $ϕ i a = ϕ i b$ . One might also argue that ellipticity should be a reciprocal eﬀect, so that $𝜃 i b = - 𝜃 i a$ . This is roughly consistent with WSRT experience, and these two assumptions are implicit in equ 27 of [3]. However, for high accuracy polarisation measurements, the parameters for each receptor should be at least partly independent.

For circular polarisation coordinates (see equ 22):

\begin{array}{rcl} D ⊙ i = ℋ D + i ℋ - 1 & = & (ℋ E l l (𝜃 i a, 𝜃 i b) ℋ - 1) (ℋ R o t (ϕ i a, ϕ i b) ℋ - 1) \\ \approx & R o t (𝜃 i a) D i a g ({exp}^{i ϕ i a}, {exp}^{- i ϕ i a}) & (41) \end{array}

Again, the $\approx$ sign gives the approximation for $ϕ i a = ϕ i b$ and $𝜃 i b = - 𝜃 i a$ . See equation 34 for an expression for ( $ℋ R o t (ϕ i a, ϕ i b) ℋ - 1$ ) where $ϕ i a \neq ϕ i b$ . The expression for ( $ℋ E l l (𝜃 i a, 𝜃 i b) ℋ - 1$ ) with $𝜃 i b \neq - 𝜃 i a$ is similar, but with real coeﬃcients, as expected for circular polarisation coordinates.

5.8 Commutation ( $Y i$ )

In some systems, the receptor signals can be switched (commuted) between IF-channels for calibration.

Y i = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}) o r Y i = (\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix})

(42)

5.9 Hybrid ( $H i$ )

In some cases, circularly polarised receptors consist of linearly polarised dipoles, followed by a ‘hybrid’. The latter is an electronic implementation of the coordinate transformation matrix $ℋ$ from linear to circular polarisation coordinates:

H i \approx ℋ

(43)

See equation 18 for the deﬁnition of $ℋ$ . If no hybrid is present, $H i$ is the unit matrix. Any gain eﬀects in these electronic components are ignored, or rather they are assumed to be ‘absorbed’ by the gain matrix $G i$ .

5.10 Electronic gain ( $G i$ )

The matrix $G i$ represents the product of all complex electronic gain eﬀects per output IF-channel $p$ and $q$ . It models the eﬀects of all feed-based electronics (ampliﬁers, mixers, LO, cables etc). (The correlator causes interferometer-based eﬀects, which are discussed in section 3).

G + i = G ⊙ i = (\begin{matrix} g i p p & g i q p \\ g i p q & g i q q \end{matrix}) \approx (\begin{matrix} g i p & 0 \\ 0 & g i q \end{matrix}) = D i a g (g i p, g i q)

(44)

The $\approx$ sign indicates that electronic cross-talk is assumed to be absent in well-designed systems, i.e. $g i p q = g i q p = 0$ . Since this kind of crosstalk is not necessarily reciprocal, $g i p q \neq g i q p$ .

In reality, $G i$ will be a product of many electronic gain matrices, one for each linear electronic component in the system: $G i = G i^{L N A} G i^{m i x e r s} G i^{c a b l e s} G i^{I F - s y s t e m} \dots$ Although a solver will not be able to distinguish these diﬀerent eﬀects from each other, but it is useful for simulation of instrumental eﬀects.

5.11 Do we need a conﬁguration matrix ( $C i$ )?

NB: This section is a little polemical, and should disappear when things are more settled.

There has been some debate about the concept of a ‘conﬁguration matrix’ $C i$ , as proposed by [2], which models the nominal feed conﬁguration. It represents an idealised coordinate transformation ‘from the frame of the rotating antenna mount to the electronic voltage frame’. It models any rotation of the receptors w.r.t. ‘the antenna mount’, which must be added to the ‘parallactic’ rotation $P i$ of the antenna w.r.t. the sky. $C i$ also models the hybrid $H i$ if present, but it ignores the primary beam $E i$ . Any deviations from this idealised behaviour are covered by the ‘leakage’ matrix $D i$ .

However, the proposed $C i$ is most suitable for the special case of fully steerable parabolic antennas. The introduction of an intermediate antenna coordinate frame seems an unnecessary complication in those cases where the mirror is not steerable, or is absent entirely (like in a dipole array). Moreover, $C i$ violates the rules of modelling by lumping together two eﬀects that have nothing to do with each other, and do not even occur at the same point in the signal path.

In principle it is a good idea to have one matrix that models the transition from electric ﬁelds (V/m) to electric voltages (V), and this is precisely what $B i$ does. This very general matrix can be split up if relevant into sub-matrices like $P i$ , $E i$ and $D i$ . The matrix $H i$ has no part in this, since it represents a rearranging of electronic signals (V), just like $Y i$ (and will come after $Y i$ if present!). The projection matrix $P i$ takes care of the entire orientation angle of the receptors w.r.t. the sky, which is the only thing that really counts.

6 THE ORDER OF JONES MATRICES

The Jones matrices in equation 3 generally do not commute, so their order is important. In principle, the matrices must be placed in the ‘physical’ order, i.e. the order of the signal propagation path. But in the equations that are enshrined in existing reduction packages, this is often not the case. This begs the question why these ‘wrong’ equations seem to produce so many good (even spectacular) results. The question is especially important since a diﬀerent order often results in considerable gains in computational eﬃciency.

The answer is that, for existing (arrays of) circularly symmetric parabolic feeds, many Jones matrices can be approximated by matrices that do commute with at least some of the others.

6.1 Overview of commutation properties

We will analyse this in terms of those special matrices (see Appendix for their deﬁnition), whose commutation properties are:

Unit matrices $𝒰$ commute with all matrices.
Multiplication matrices $M u l t (a)$ , i.e. diagonal matrices with equal elements $a$ , are equivalent to a multiplicative factor. Therefore, they commute with all matrices.
Diagonal matrices $D i a g (a, b)$ with unequal elements $a, b$ commute with each other.
Pure rotation matrices $R o t (α)$ commute with each other.
Pseudo rotation matrices $R o t (α, β)$ do not commute wit each other or with pure rotation matrices $R o t (α)$ . Moreover, there should only be one pseudo rotation matrix in the chain, and it should be to the left of (i.e. after) all other rotation matrices: $R o t (α, β) R o t (γ) = R o t (α + γ, β + γ) \neq R o t (γ) R o t (α, β)$ .
Ellipticity matrices $E l l (α, β)$ do not commute with each other , except when $β = - α$ . Moreover: $E l l (α, β) E l l (γ, - γ) = E l l (α + γ, β - γ) \neq E l l (γ, - γ) E l l (α, β)$ .

In order to study the general implications of changing the order of multiplication, we take the two products $m . M$ and $M . m$ of two general matrices (whose elements may be complex):

\begin{array}{rcl} (\begin{matrix} a & c \\ d & b \end{matrix}) (\begin{matrix} A & C \\ D & B \end{matrix}) & = & (\begin{matrix} a A + c D & a C + c B \\ d A + b D & d C + b B \end{matrix}) \\ (\begin{matrix} A & C \\ D & B \end{matrix}) (\begin{matrix} a & c \\ d & b \end{matrix}) & = & (\begin{matrix} a A + d C & c A + b C \\ a D + d B & c D + b B \end{matrix}) & (45) \end{array}

The diﬀerence (i.e. commutation error) between the two matrix products can be expressed as a matrix $Δ$ :

m M = M m + Δ = M m + (\begin{matrix} c D - d C & - c (A - B) + C (a - b) \\ d (A - B) - D (a - b) & - (c D - d C) \end{matrix})

(46)

Thus, by taking the wrong matrix order, one makes the following fractional errors of the following order in the result:
- in the diagonal elements: of the order of $c ∕ a$ , i.e. the ratio of non-diagonal and diagonal elements of the original matrices (which is often small).
- in the oﬀ-diagonal elements: in the order of $(a - b) ∕ a$ , i.e. they will be smaller as the diagonal elements of the original matrices are more equal.

If one of the two matrices is diagonal, e.g. $c = d = 0$ then this reduces to:

m M = M m + (\begin{matrix} 0 & C (a - b) \\ D (b - a) & 0 \end{matrix})

(47)

The (not very surprising) conclusion is that the error caused by taking the wrong matrix order is smaller when one of the matrices is diagonal, and the values of its diagonal elements are almsot equal.

6.2 Overview of Jones matrix forms

It is suﬃcient to discuss the commutation properties of the feed-based Jones matrices because, if $A i$ commutes with $B i$ and $A j$ with $B j$ , then $(A i \otimes A j *)$ commutes with $(B i \otimes B j *)$ :

(J i \otimes J j *) = (A i B i \dots Z i) \otimes (A j B j \dots Z j) * = (A i \otimes A j *) (B i \otimes B j *) \dots (Z i \otimes Z j *)

(48)

Inspecting the various Jones matrices separately:

F + i

= pure rotation

R o t (χ i)

F ⊙ i

= diagonal matrix

D i a g ({exp}^{i χ i}, {exp}^{- i χ i})

T + i, T ⊙ i

= multiplication

M u l t (t i)

K i

= multiplication

M u l t ({exp}^{i \vec{ρ} . \vec{r} i})

\vec{r} i a = \vec{r} i b

(virtually always the case)

P + i

= pure rotation

R o t (γ x a)

γ x a = γ y b

P ⊙ i

= diagonal matrix

D i a g ({exp}^{i γ x a}, {exp}^{- i γ x a})

γ x a = γ y b

P + i

= pseudo-rotation

R o t (γ x a, γ y b)

γ x a \neq γ y b

P ⊙ i

= A general matrix if

γ x a \neq γ y b

E + i, E ⊙ i

= diagonal matrix

D i a g (e i a a, e i b b)

if no cross-leakage (

e i a b = e i b a = 0

)

= multiplication

M u l t (e i)

if also

e i a a = e i b b

for all

\vec{ρ}

D + i, D ⊙ i

\approx

unit matrix

𝒰

if small leakage, i.e. (

d i a b \approx d i b a \approx 0

)

D + i

E l l (𝜃 i a, 𝜃 i b)

R o t (ϕ i a, ϕ i b)

\approx

E l l (𝜃 i a, - 𝜃 i a)

R o t (ϕ i a)

𝜃 i b = - 𝜃 i a

and

ϕ i b = ϕ i a

D ⊙ i

= (

ℋ E l l (𝜃 i a, 𝜃 i b) ℋ - 1

) (

ℋ R o t (ϕ i a, ϕ i b) ℋ - 1

)

\approx

R o t (𝜃 i a)

D i a g ({exp}^{i ϕ i a}, {exp}^{- i ϕ i a})

𝜃 i b = - 𝜃 i a

and

ϕ i b = ϕ i a

[Y i]

= anti-diagonal matrix: a problem, if present....

[H i]

= eﬀectively hidden if present, see equation 24

G i

= diagonal matrix

D i a g (g i p p, g i q q)

if no cross-talk

Problems are caused predominantly by matrices with non-zero oﬀ-diagonal elements like $D i$ , $Y i$ , and $P i$ if $γ x a \neq γ y b$ . Of these, only $D i$ is present in all telescopes. $P i$ will be a problem for SKAI, bacause $γ x a \neq γ y b$ .

6.3 Allowable changes of order

The following changes in the order of Jones matrices is allowed, but only under the indicated conditions. NB: Some Jones matrices will commute if it can be assumed that the observed source is compact, dominating, unpolarised and near the centre of the ﬁeld. This is often the case.

If the Faraday angle does not vary over the primary beam, $F i$ might be applied in the uv-plane. $F i$ will in general commute with $P i$ except when $P i$ is a pseudo-rotation ( $γ x a \neq γ y b$ ). $F ⊙ i$ is diagonal, and will commute with $E i$ if it is diagonal. But $F + i$ will only commute with $E i$ if the latter is a multiplication. If there is appreciable cross-leakage, $F i$ should stay to the right of $D i$ , which means that in that case $F ⊙ i$ cannot be lumped with $G i$ as is often done.
$T i$ is a multiplication, which commutes with everything. If it does not vary over the primary beam, it can be lumped with $G i$ .
If the two receptors of a feed are located at the same position (which is virtually always the case), the FT kernel matrix $K i (\vec{ρ} . \vec{r} i)$ reduces to a multiplication $k i (\vec{ρ} . \vec{r} i)$ . This means that the FT can be performed at any desired place in the chain, even to the right of the Stokes matrix. NB: If $\vec{r} i a \neq \vec{r} i b$ , it would not be trivial to ﬁgure out what the correct position of $K i$ should be.
If the map centre $\vec{ρ} m c$ is diﬀerent from the fringe tracking centre $\vec{ρ} f t c$ , the FT kernel may be split into a product: $K i (\vec{ρ} . \vec{r} i) = K 0 i (\vec{ρ} m c . \vec{r} i) K^{'} i ((\vec{ρ} - \vec{ρ} m c) . \vec{r} i)$ . Since $\vec{ρ} m c$ does not depend on source position, $K 0 i (\vec{ρ} m c . \vec{r})$ may be moved to the leftmost part of the chain, i.e. to the uv-plane part.
If $E i (\vec{ρ}) = E j (\vec{ρ}) = M u l t (e (\vec{ρ}))$ , i.e. if all voltage patterns are identical, then $E i j = (E i \otimes E j *)$ commutes with the Stokes matrix $S$ and may be applied directly to the Stokes vector $\vec{I}$ in the image plane. This condition is more likely to occur near the beam centre. NB: Because $E i j$ does deﬁnitely not commute with $S$ if $e i a a \neq e i b b$ , the justiﬁcation for the practice of applying oﬀ-axis instrumental polarisation to $\vec{I}$ seems a little doubtful.
$P i$ may be moved to the left of $E i$ if they are both diagonal matrices, or if $E i$ is a multiplication. Since $P ⊙ i$ is diagonal and $P + i$ is not (except for equatorial mounts), this appears to be an argument in favour of the use of circular polarisation coordinates. If $E i$ is diagonal and almost a multiplication (i.e. $e i a a \approx e i b b$ ), $P + i$ may be moved to the left of $E i$ at the cost of a small error of the order $(e i a a - e i b b) ∕ e i a a$ (see equation 47).
If $P i$ and $E i$ do not commute at all, one can still move $P i$ to the left of $E i$ by using $E i P i = P i (P i^{- 1} E i P i) = P i E i^{^{″}}$
Since this re-introduces time-dependent oﬀ-diagonal elements into $E i^{^{″}}$ , it is not clear how useful this is.

6.4 VisJones and SkyJones

The Jones matrices may split up in two groups: $J i = J v i s i J s k y i$ . In these terms, the full M.E. (ignoring normalisation factors, see equ 6) becomes:

\vec{V} i j = \int d t \int d f (J v i s i \otimes J v i s j *) \sum_{k} \int d l d m (J s k y i \otimes J s k y j *) S {\vec{I}}_{k}

(49)

We now see the reason for placing the integration over $f$ and $t$ to the left of the sum over $k$ sources. Since it is computationally advantageous to minimise the number of Jones matrices that operate in the image plane, it must be investigated whether Jones matrices that do not depend on the source position can be moved to the left in the chain, using the rules in section 6.3 above. Depending on the chosen coordinate system, (and always keeping in mind the conditions for re-ordering Jones matrices), the following split appears to be the maximum obtainable:

\begin{array}{rcl} J v i s i & = & K 0 i (G i T i) D + i P + i F + i (u s i n g S = S +) & (50) \\ = & K 0 i (G i T i F ⊙ i) D ⊙ i P ⊙ i (u s i n g S = S ⊙) & (51) \\ J s k y i & = & E i K^{'} i & (52) \end{array}

This is what is done implicitly in some existing reduction packages.

6.4.1 Tied Array

For a tied array (ignoring integration and weight factors for the moment), equation 5 becomes:

\vec{V} i j = (Q i \otimes Q j *) \sum_{n} \sum_{m} (J v i s i n \otimes J v i s j m *) \sum_{k} (J s k y i n k \otimes J s k y j m k *) S {\vec{I}}_{k}

(53)

Under extremely favourable conditions, i.e. if:
- individual feed beams per tied array are identical.
- Faraday rotation is the same for an entire tied array
- All receptors of a tied array have the same orientation.
- receptor cross-leakages are small.
- tied array feed signals are corrected before adding.
- there are no delay errors.
then equation 53 can be reduced to:

\vec{V} i j = (Q i \otimes Q j *) (P i \otimes P j *) (F i \otimes F j *) \sum_{k} (E i k \otimes E j k *) \sum_{n} \sum_{m} (K i n k \otimes K j m k *) S {\vec{I}}_{k}

(54)

References

[1] J.D.Bregman, J.E.Noordam Matrix formalism for Interferometric Polarisation Calibration. Internal proposal to AIPS++ project, April 1993.

[2] J.P.Hamaker, J.D.Bregman, R.J. Sault Understanding Radio Polarimetry I: Mathematical foundations. Accepted by Astronomy and Astrophysics, Sept 1995. (For a preprint, see http:://www.nfra.nl/ $\sim$ hamaker).

[3] R.J.Sault, J.P.Hamaker, J.D.Bregman Understanding Radio Polarimetry II: Instrumental calibration of an interferometer array. Accepted by Astronomy and Astrophysics, Sept 1995. (For a preprint, see http:://www.nfra.nl/ $\sim$ hamaker).

[4] J.P.Hamaker, J.D.Bregman Understanding Radio Polarimetry III: Interpreting the IAU/IEEE deﬁnitions of the Stokes parameters Submitted to Astronomy and Astrophysics, Oct 1995. (For a preprint, see http:://www.nfra.nl/ $\sim$ hamaker).

[5] J.E.Noordam Some practical aspects of the matrix-based Measurement Equation of a generic radio telescope. AIPS++ Implementation note 182 (June 1995)

[6] T.J.Cornwell Calibration and Imaging using the Measurement Equation for the Generic Interferometer. AIPS++ Implementation note 183 (July 1995)

[7] T.J.Cornwell The Generic Interferometer I: Overview of Calibration and Imaging AIPS++ Implementation note 183 (August 1995)

[8] T.J.Cornwell The Generic Interferometer II: Image Solvers AIPS++ Implementation note ... (revised version, Aug 1995) developing

[9] T.J.Cornwell The Generic Interferometer III: Analysis of Calibration and Imaging AIPS++ Implementation note ... (Nov 1995) developing

[10] T.J.Cornwell, M.H.Wieringa The Generic Interferometer IV: Design of Calibration and Imaging AIPS++ Implementation note ... (Dec 1995) developing

[11] T.J.Cornwell The Generic Interferometer V: Speciﬁcation of Calibration and Imaging AIPS++ Implementation note ... (Sept 1995) developing

[12] A.R.Thompson, J.M.Moran, G.W.Swenson Interferometry and Synthesis in Radio Astronomy. John Wiley and Sons (1986)

[13] R.A.Perley, F.R.Schwab, A.H.Bridle Synthesis Imaging in Radio Astronomy. Astronomical Society of the Paciﬁc Conference Series, Vol 6 (1989)

A APPENDIX: CONVENTIONS

A consistent nomenclature and precise deﬁnitions are extremely important for a software package like AIPS++, which aspires to be a ‘world reduction package’, and to which workers with a large spacetime separation are supposed to contribute. One of the most sensitive areas in this respect is the Measurement Equation, which underlies the central subject of uv-calibration and imaging.

However, it is not easy to deﬁne, adopt and enforce the use of a suitable set of conventions. This appendix is a hopefully useful step in that process. It proposes coordinate conventions and some deﬁnitions (notably the one for feed!), and lists symbols that have been deﬁned in a separate TeX ﬁle (referred to as \include(megi-symbols) in this LaTeX document). The TeX syntax is shown in small print (e.g. \FeedI), for easy reference.

A.1 Some deﬁnitions

The following deﬁnitions are displayed in a distinctive font throughout the text of this document in order to emphasize that they have been deﬁned explicitly.

A receptor (\Receptor) converts the incident electric ﬁeld into a voltage.
An IF-channel (\IFchannel) is one of the two output signals of a feed, one for each ‘polarisation’. NB: The signals in a pair of IF-channels may be a linear combination of the signals of the two receptors.
A feed (\Feed) is the most fundamental concept of the M.E. formalism, since Jones-matrices are feed-based. Although a feed may sometimes have only one receptor, it usually has two, which is necessary and suﬃcient to fully sample the incident e.m. ﬁeld. Each feed is modelled by its own Jones matrix. NB: A feed is a logical concept. Thus, the same physical feed may be involved in several logical feeds, e.g. for diﬀerent beams in a multi-beam instrument, or for diﬀerent spectral windows.
An antenna (\Antenna) is a physical grouping of feeds. NB: As a concept, it tends to play a rather confusing role in the M.E. discussions.
An interferometer (\Interferometer) is the combination of two feeds. Its output is a visibility of 1-4 spectra, depending on the number of IF-channels per feed. NB: Sometimes the combination of two individual IF-channels is also called an interferometer. In that case, its output is a single spectrum.
A telescope (\Telescope) is an entire instrument. It can be a single dish (e.g. GBT) or an aperture synthesis array (e.g. ATCA).
A projected (\Projected) angle is an angle projected on the plane perpendicular to the propagation direction (the $z$ -axis).

A.2 Labels, sub- and super-scripts

i, j

\FeedI,\FeedJ

feed labels

a, b

\RcpA,\RcpB

receptor labels, two per feed.

p, q

\IFP,\IFQ

IF-channel labels,two per feed.

r, l

\RPol,\LPol

circular polarisation (right, left)

x, y

\XPol,\YPol

linear polarisation (N-S, E-W)

A +, A ⊙

A\ssLin,A\ssCir

superscripts for linear and circular polarisation

A i, A i j

A\ssI,A\ssIJ

feed subscripts

The subscript convention of matrix elements is as follows: $Y i b p$ refers to a matrix element of matrix $Y$ for feed $i$ , which models the coupling of the signal going from receptor $b$ to IF-channel $p$ .

A.3 Coordinate frames

Fig 1 gives an overview of the coordinate system(s) used. All angles on the Sky are measured counter-clockwise, i.e. in the direction North through East. When relevant, ‘axis’ means ‘positive axis’ (e.g. the positive $x$ -axis). It is important to make a distinction between:

The beam frame(s): In order to calculate the eﬀects of the primary beam on the signal of a source in direction $\vec{ρ} (l, m)$ , the shape and position of the voltage beams of each receptor on the Sky has to be calculated. For fully steerable parabolic antennas, which have constant beamshapes, this can be done most conveniently in coordinate frames deﬁned by the projected position angles of the receptors. To allow for the fact that the two beams of a feed are closely coupled, an intermediate feed-frame is deﬁned also.

The electrical frame: For the polarisation of the signal, the only relevant parameters are the projected angles w.r.t. the ‘electrical’ axes $x$ and $y$ deﬁned by the IAU.

NB: In order to see that two frames are needed, consider that Faraday rotation rotates the electric vector, but not the beam on the sky.

Frame of the entire telescope (single dish or array):

\vec{r}

\vvAntPos

Projected feed (receptor?) position vector

u, v, w

\ccU,\ccV,\ccW

Projected baseline coordinates

\vec{u}

\vvUVW

Projected baseline vector

\vec{u} (u, v, w)

Electrical frame on the sky (IAU deﬁnition):

x, y

\ccX,\ccY

IAU electrical frame on the sky.

z

\ccZ

propagation direction of incident ﬁeld.

γ x y

\aaXY

Angle from

x

-axis to

y

-axis (

= π ∕ 2

)

x, y

\ccXPol,\ccYPol

linear polarisation coordinates.

r, l

\ccRPol,\ccLPol

circular polarisation coordinates.

Sky frame (w.r.t. fringe stopping centre):

l, m, n

\ccL,\ccM,\ccN

Coordinates (direction cosines)

\vec{ρ}

\vvLMN

Source direction vector

\vec{ρ} (l, m)

\vec{ρ} f t c

\vvFTC

Fringe Tracking Centre

\vec{ρ} f t c (R A, D E C, f)

\vec{ρ} m c

\vvMC

Map Centre

\vec{ρ} f t c (l, m)

γ l m

\aaLM

Angle from

l

-axis to

m

-axis (

= π ∕ 2

)

γ l x

\aaLX

Angle from

l

-axis to

x

-axis (

= π ∕ 2

)

Coordinate frame of feed

i

, projected on the sky:

l^{'} i, m^{'} i

\ccLI,\ccMI

Coordinates

l i 0, m i 0

\ccLIO,\ccMIO

Origin (

l, m

) of feed-frame.

γ l i

\aaLI

Angle from

l

-axis to

l^{'} i

-axis

γ x i

\aaXI

Angle from

x

-axis to

l^{'} i

-axis (

= - γ l x + γ l i

)

Coordinate frame of receptor

a

of feed

i

, projected on the sky:

l^{″} i a, m^{″} i a

\ccLIA,\ccMIA

Coordinates

l^{'} i a 0, m^{'} i a 0

\ccLIAO,\ccMIAO

Origin (

l^{'} i, m^{'} i

) of receptor-frame.

γ i a

\aaIA

Angle from

l^{'} i

-axis to

l^{″} i a

-axis

γ x a

\aaXA

Angle from

x

-axis to

l^{″} i a

-axis (

= - γ l x + γ l i + γ i a

)

Coordinate frame of receptor

b

of feed

i

, projected on the sky:

l^{″} i b, m^{″} i b

\ccLIB,\ccMIB

Coordinates

l^{'} i b 0, m^{'} i b 0

\ccLIBO,\ccMIBO

Origin (

l^{'} i, m^{'} i

) of receptor-frame.

γ i b

\aaIB

Angle from

l^{'} i

-axis to

l^{″} i b

-axis

γ y b

\aaYB

Angle from

y

-axis (!) to

l^{″} i b

-axis (

= - γ x y - γ l x + γ l i + γ i b

)

A (rather crowded) overview of the various coordinate frames for the Measurement Equation. See also the text. The origin of the Sky frame ( $l, m$ ) is deﬁned by the fringe stopping centre. The origin of the feed-frame ( $l^{'} i, m^{'} i$ ) is deﬁned by the pointing centre of feed $i$ . The ‘pointing centres’ of the voltage beams of receptors $a$ and $b$ (marked with a and b) deﬁne the origins of the receptor-frames ( $l^{″} i a, m^{″} i a$ ) and ( $l^{″} i b, m^{″} i b$ ). The shapes and position oﬀsets of these voltage beams are exaggerated, in order to emphasise that they do not necessarily coincide.

The coordinates $l^{″} i a, m^{″} i a$ and $l^{″} i b, m^{″} i b$ of the frames of receptors $a$ and $b$ in equ 37 are related to the celestial coordinate frame $l, m$ in a two-step process. First we deﬁne an intermediate feed-frame $l^{'} i, m^{'} i$ for feed $i$ , projected on the Sky:

(\begin{matrix} l^{'} i \\ m^{'} i \end{matrix}) = R o t (γ l i) (\begin{matrix} l - l i 0 \\ m - m i 0 \end{matrix})

(55)

in which $(l i 0, m i 0)$ is the Pointing Centre of feed $i$ , and $R o t (γ l i)$ is a rotation over the projected angle $γ l i$ between the positive $l$ -axis of the Sky frame and the $l^{'} i$ -axis of the feed-frame.

The voltage beams themselves are best modelled in a receptor-frame (see equ 37), again projected on the Sky. For receptor $a$ we have:

(\begin{matrix} l^{″} i a \\ m^{″} i a \end{matrix}) = R o t (γ i a) (\begin{matrix} l^{'} i - l^{'} i a 0 \\ m^{'} i - m^{'} i a 0 \end{matrix})

(56)

The matrix $R o t (γ i a)$ represents a rotation over the angle $γ i a$ between the positive $l^{'} i$ -axis of the feed-frame and the $l^{″} i a$ -axis of the relevant receptor-frame. For receptor $b$ :

(\begin{matrix} l^{″} i b \\ m^{″} i b \end{matrix}) = R o t (γ i b) (\begin{matrix} l^{'} i - l^{'} i b 0 \\ m^{'} i - m^{'} i b 0 \end{matrix})

(57)

$(l^{'} i a 0, m^{'} i a 0)$ and $(l^{'} i b 0, m^{'} i b 0)$ represent pointing oﬀsets of receptor $a$ and $b$ respectively. These can be used to model ‘beam-squint’ of feeds that are not axially symmetric.

A.4 Matrices and vectors

The following matrices and vectors play a role in the Measurement Equation:

\vec{I}

\vvIQUV

Stokes vector of the source (I,Q,U,V).

\vec{V}, v

\vvCoh,\vvCohEl

Coherency vector, and one of its elements.

S

\mmStokes

Stokes matrix, conversion between polarisation representations.

S +

\mmStokes\ssLin

Conversion to linear representation.

S ⊙

\mmStokes\ssCir

Conversion to circular representation.

ℳ

\mmMueller

Mueller matrix: Stokes to Stokes through optical ‘element’

X, x

\mmXifr,\mmXifrEl

Correlator matrix (

4 \times 4

M, m

\mmMifr,\mmMifrEl

Multiplicative interferometer-based gain matrix (

4 \times 4

\vec{A}, a

\vvAifr,\vvAifrEl

Additive interferometer-based gain vector.

The following feed-based Jones matrices $(2 \times 2)$ have a well-deﬁned meaning:

J, j

\mjJones,\mjJonesEl

Jones matrix, and one of its elements.

F, f

\mjFrot,\mjFrotEl

Faraday rotation (of the plane of linear pol.)

T, t

\mjTrop,\mjTropEl

Atmospheric gain (refraction, extinction).

P, p

\mjProj,\mjProjEl

Projected receptor angle(s) w.r.t.

x, y

frame

B, b

\mjBtot,\mjBtotEl

Total feed voltage pattern (i.e.

B = D E P

E, e

\mjBeam,\mjBeamEl

Traditional feed voltage beam.

C, c

\mjConf,\mjConfEl

Feed conﬁguration matrix (...).

D, d

\mjDrcp,\mjDrcpEl

Leakage between receptors

a

and

b

H, h

\mjHybr,\mjHybrEl

Hybrid network, to convert to circular pol.

G, g

\mjGrec,\mjGrecEl

feed-based electronic gain.

K, k

\mjKern,\mjKernEl

Fourier Transform Kernel (baseline phase weight)

K 0, k 0

\mjKref,\mjKrefEl

FT kernel for the fringe-stopping centre.

K^{'}, k^{'}

\mjKoff,\mjKoffEl

FT kernel relative to the fringe-stopping centre.

Q, q

\mjQsum,\mjQsumEl

Electronic gain of tied-array feed after summing.

Some special matrices and vectors:

Z e r o

\mmZero

Zero matrix

\vec{0}

\vvZero

Zero vector

𝒰

\mmUnit

Unit matrix

D i a g (a, b)

\mjDiag

Diagonal matrix with elements

a, b

M u l t (a)

\mjMult

Multiplication with factor

a

R o t (α [, β])

\mjRot

[pseudo] Rotation over an angle

α

β

E l l (α [, β])

\mjEll

Ellipticity angle[s]

α

β

ℋ

\mjLtoC

Signal conversion from linear to circular.

ℋ - 1

\mjCtoL

Signal conversion from circular to linear.

Deﬁnitions of some special matrices:

D i a g (a, b) \equiv (\begin{matrix} a & 0 \\ 0 & b \end{matrix}) D i a g (a, a) = M u l t (a) = a (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})

(58)

A ‘pure’ rotation $R o t (α)$ is a special case of a ‘pseudo rotation’ $R o t (α, β)$ :

R o t (α, β) \equiv (\begin{matrix} cos α & - sin α \\ sin β & cos β \end{matrix}) R o t (α) \equiv R o t (α, α) = (\begin{matrix} cos α & - sin α \\ sin α & cos α \end{matrix})

(59)

Ellipticity:

E l l (α, β) \equiv (\begin{matrix} cos α & i sin α \\ - i sin β & cos β \end{matrix}) E l l (α) \equiv E l l (α, - α) = (\begin{matrix} cos α & i sin α \\ i sin α & cos α \end{matrix})

(60)

A.5 Miscellaneous parameters

β

\ppParall

Parallactic angle, form North pole to zenith

H A

\ppHA

Hour Angle

R A

\ppRA

Right Ascension

D E C

\ppDEC

Declination

L A T

\ppLAT

Latitude on Earth

t

\ccT

Time

f

\ccF

Frequency

χ

\ppFarad

Faraday rotation angle

a

\ppAmpl

Amplitude

ψ

\ppPhase

Phase

ζ

\ppPhaseZero

Phase zero

ϕ

\ppRcpPosDev

Dipole position angle error

𝜃

\ppRcpEllDev

receptor ellipticity

¹The generic IF-channel labels $p$ and $q$ are known as $X$ and $Y$ for WSRT and ATCA, and $R$ and $L$ for the VLA. They should not be confused with the two receptors $a$ and $b$ , since the signal in an IF-channel may be a linear combination of the receptor signals.

²Also called the outer matrix product, or tensor product, or Kronecker product. See [2].

³In one inﬂuential book [12], the factor $0.5$ is omitted from $S ⊙$ . This is clearly incorrect, since a single receptor can never measure more than one half of the total ﬂux of an unpolarised source.

⁴One might argue that a more consistent form of $ℋ$ would be an expression in terms of the $\pm π ∕ 4$ ellipticities that are intrinsic to a circular receptor:

ℋ^{a l t e r n a t i v e} = E l l (π ∕ 4, - π ∕ 4) = \frac{1}{\sqrt{2}} (\begin{matrix} 1 & i \\ i & 1 \end{matrix}) = (\begin{matrix} 1 & 0 \\ 0 & i \end{matrix}) ℋ

(17)

However, a choice for a diﬀerent $ℋ$ should not be made lightly, since it would aﬀect the deeply entrenched form of the Stokes matrices.