MeasurementSet access in AIPS++
Athol Kemball
October 15, 2000
A pdf version of this note is available.
Contents
1 Introduction
The C++ MeasurementSet (MS) access classes are fundamental to the AIPS++ library, and find general use in
several areas, including data fillers, the ms tool and a range of data reduction applications. This note is focused
primarily on basic MS access in synthesis at this time; it describes the design of the current classes, and
considers their future extension and consolidation. The MS data format is described in AIPS++ Note
229.
2 Requirements for the MS access classes
MeasurementSets are AIPS++ Tables and their existing access classes are built closely on the Table infrastructure.
This assumption is implicit in the requirements listed in this section. The C++ classes used for MS access need to
meet the following requirements, or provide the following services:
- Basic whole MS operations: MS creation, deletion and related MS-level operations.
- Column and row access: Access to MS MAIN and sub-table rows and columns.
- Sorting and selection: The ability to support bulk sorting and selection operations. MS selections
are expected to be expressed in a close variant of the Table Query Language (TaQL), defined elsewhere.
- MAIN MS iteration: Sequential iteration through the MAIN MS table on arbitrary iteration indices,
with optional retrieval of associated MS sub-table data matching the current MAIN table iteration
interval. Indexed access to the MAIN table, on limited key indices, is a secondary requirement. MAIN
table iteration includes simultaneous or sequential iteration through multiple MS considered as a set.
Iteration is also required within MS MAIN rows, in frequency, velocity or polarization.
- MS sub-table iteration: Independent sequential iteration through individual MS sub-tables, on
arbitrary iteration indices. Keyed access to MS sub-tables in read-write mode, on arbitrary key indices.
Iteration is also required within rows, in frequency, velocity or polarization, as appropriate for individual
sub-tables.
- MS derived quantities: Computation or retrieval of derived MS quantities, which require data access
across the MS as a whole. Examples include elevation computation, assembling higher-level coordinate
information or retrieval of Doppler shift parameters.
- MS data ranges: Determination of the data ranges in a given MS, for any MAIN or sub-table columns
or derived quantities.
- MS data objects: Provision of C++ classes to model a data cube containing part or all of the current
iteration or selection, which can be operated upon by other agent classes (e.g. calibration), and passed
between AIPS++ classes and between C++ and Glish.
- Connection to Glish: Versatile access to the C++ MS access classes from Glish.
- Scratch column management: Creation and deletion of scratch columns which may be added to,
or removed from the MS during reduction.
- Parallel I/O support: Full integration of parallel I/O capabilities within the standard serial MS
access classes, and the ability to optionally enable the parallel I/O capabilities.
- Table infrastructure use: Full re-use of all Table system infrastructure and design philosophy
wherever possible in the MS access classes.
- Efficiency: I/O is a critical part of astronomical data reduction performance, and efficiency is a key
requirement for MS access. This includes integrated I/O profiling, in both the serial and parallel case.
3 Current capabilities and design
The following capabilities are currently provided:
- Basic whole MS operations: The basic classes for creating a MS or MS
sub-table are: MeasurementSet, MSAntenna, MSDataDescription, MSDoppler, MSFeed, MSField,
MSFlagCmd, MSFreqOffset, MSHistory, MSObservation, MSPointing, MSPolarization,
MSProcessor, MSSource, MSSpectralWindow, MSState, MSSysCal, and MSWeather. The MSTable
class is a templated base class for the MS MAIN and sub-tables.
- Column and row access: The table column accessors are provided by MS*Columns, and the table
column definitions are defined in MS*Enums. MSColumns provides access to the MS as a whole. Missing:
no specialized MS row-based access.
- Sorting and selection: MS sorting and selection is possible using the TaQL capabilities directly.
Sorting and selection are also performed by MSSelector, to some degree in MSIter, and in specialized
form by reduction classes such as imager and calibrater, amongst others. Missing: i) unified MS
selection to TaQL converter (partially implemented in [calibrater|imager].g) allowing aliases,
sub-table lookup, derived quantities and [0,1] indexing; ii) centralization of MS selection services in a
utility class; and, iii) unification of MSSelector GlishRecord selection syntax and a general MS selection
syntax, including unification of MSSelector keywords and MSCalEnums.
- MAIN MS iteration: Sequential iteration on limited indices is provided by MSIter and
VisibilityIterator, using TableIterator. This includes velocity or frequency iteration within
a row, and iteration over rows with the same time stamp within an iteration interval. Sequential
iteration through multiple MeasurementSets is supported. Higher-level access to MSIter is provided by
MSSelector. Missing: i) arbitrary iteration indices; ii) more control over velocity iteration including
multi-source support; iii) access to general MS selection; iv) simultaneous, multi-MS iteration; v)
retrieval of MS sub-table rows associated with the current MAIN iteration block; vi) write access to
more columns; vii) customized sub-iteration within a MAIN iteration block; and viii) indexed MAIN
access.
- MS sub-table iteration: Read-only, indexed access is possible in certain forms using the MS*Index
classes, which are built on top of ColumnsIndex. Missing: i) arbitrary, read-write keyed lookup, with
different forms of interpolation where required; and, ii) specialized sequential iterators.
- MS derived quantities: Currently provided by several classes, including MSDerivedValues, and
MSDopplerUtil, amongst others. Missing: other computed quantities as needed.
- MS data ranges: Currently performed by MSRange. Missing: C++ interface which does not use
GlishRecords.
- MS data objects: Currently provided by VisBuffer, but specialized forms also implemented
in MSFlagger. Missing: i) greater MS coverage; ii) optional write-through to underlying MS; iii)
customized frequency averaging; and, iv) re-select on buffer using standard MS selection syntax.
- Connection to Glish: Currently provided by the ms tool, using MSSelector, MSSelUtil[2],
MSRange, and MSFlagger primarily. Missing: i) greater unification with VisibilityIterator and
VisBuffer; ii) duplicates some MS utility functions implemented elsewhere (e.g. time, frequency
averaging).
- Scratch column management: VisSet currently handles creation and addition of the MAIN
columns: MODEL_DATA, CORRECTED_DATA and IMAGING_WEIGHT.
- Parallel I/O support: Prototype parallel I/O capabilities exist in pimager::tryparread. Missing:
full parallel I/O implementation.
- Table infrastructure use: It is believed that no infrastructure is implemented in the current MS
access classes which should properly be moved to the Table system. Missing: i) large file support in
the Table system; and, ii) ability to remove existing columns held in TSM
- Efficiency: I/O profiling curently is implemented using PABLO (UIUC). Missing: continued profiling
of computational and I/O components.
4 Proposed revisions
This section considers short-term revisions to the existing MS access classes to rationalize certain existing
capabilities, and to add missing capabilities required for application development in several areas in the short
term.
- Consolidate Glish access layer: At present, methods to pack output MS data into GlishRecords, and
to unpack and accept input data and selections in GlishRecord format, exist in MSFlagger, MSRange
and MSSelector. These classes do not use VisibilityIterator or VisBuffer, but use buffers of
GlishRecords to hold the data where required in C++. The proposal in this area is to isolate the
GlishRecord interface code in a separate class called MSGlishData (or similar), and to migrate to the use
of VisibilityIterator and VisBuffer in the classes mentioned above. This will allow consolidation
of code existing elsewhere for data averaging in time and frequency. The Glish interface class will also
be responsible for translating between a VisBuffer and a Glish data object, which will remain in
GlishRecord representation.
- Selection: The proposal in this area is to concentrate MS selection and sorting only in MSSelector,
including: a) add MS selection to TaQL converter (currently in Glish); b) form union MSCalEnums
and MSSelectionKeywords; c) add sorting (col. names); d) use new MSSelector in MSIter, imager and
calibrater.
- MSIter, VisibilityIterator, and VisBuffer: The major changes to these classes include addition of
access to selection, and sub-table look-up. Specifically: a) add general selection to constructor, which
uses new MSSelector; b) add sub-table lookup for current MAIN iterator interval, and propagate these
changes to VisIter and VisBuffer.