casacore
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Modules | Classes

CTDS (Casacore Table Data System) is the data storage mechanism for Casacore. More...

Modules

 Tables_module_internal_classes
 Internal Tables_module classes and functions.
 

Classes

class  casacore::ArrayColumn< T >
 Read and write access to an array table column with arbitrary data type. More...
 
class  casacore::ArrayColumnBase
 Read and write access to an array table column with arbitrary data type. More...
 
class  casacore::BaseSlicesFunctor
 Abstract baseclass for slices functors. More...
 
class  casacore::GetCellSlices
 Functor to get irregular array slices from a cell. More...
 
class  casacore::GetColumnSlices
 Functor to get irregular array slices from a column. More...
 
class  casacore::PutCellSlices
 Functor to put irregular array slices into a cell. More...
 
class  casacore::PutColumnSlices
 Functor to get irregular array slices from a column. More...
 
class  casacore::ArrayColumnDesc< T >
 Templated class for description of table array columns. More...
 
class  casacore::ColumnDesc
 Envelope class for the description of a table column. More...
 
class  casacore::ColumnsIndex
 Index to one or more columns in a table. More...
 
class  casacore::ColumnsIndexArray
 Index to an array column in a table. More...
 
struct  casacore::ReadAsciiTable_global_functions_readAsciiTable
 Filling a table from an Ascii file. More...
 
class  casacore::RowCopier
 RowCopier copies all or part of a row from one table to another. More...
 
class  casacore::ScalarColumnDesc< T >
 Templated class to define columns of scalars in tables. More...
 
class  casacore::ScalarColumn< T >
 Access to a scalar table column with arbitrary data type. More...
 
class  casacore::ScalarRecordColumnDesc
 Class to define columns of scalar records in tables. More...
 
class  casacore::SetupNewTable
 Create a new table - define shapes, data managers, etc. More...
 
class  casacore::StorageOption
 Options defining how table files are organized. More...
 
class  casacore::SubTableDesc
 Description of columns containing tables. More...
 
class  casacore::Table
 Main interface class to a read/write table. More...
 
class  casacore::TableColumn
 Read/write access to a table column. More...
 
class  casacore::TableCopy
 Class with static functions for copying a table. More...
 
class  casacore::TableDesc
 Define the structure of a Casacore table. More...
 
class  casacore::TableError
 Base error class for storage manager. More...
 
class  casacore::TableInternalError
 Internal table error. More...
 
class  casacore::TableDuplFile
 Table error; table (description) already exists. More...
 
class  casacore::TableNoFile
 Table error; table (description) not found. More...
 
class  casacore::TableDescNoName
 Table error; no name given to table description. More...
 
class  casacore::TableInvOpt
 Table error; invalid table (description) option. More...
 
class  casacore::TableNoDatFile
 Table error; table.dat file not found. More...
 
class  casacore::TableInvType
 Table error; table type mismatch. More...
 
class  casacore::TableInvColumnDesc
 Table error; invalid column description. More...
 
class  casacore::TableInvHyperDesc
 Table error; invalid hypercolumn description. More...
 
class  casacore::TableUnknownDesc
 Table error; unknown column description. More...
 
class  casacore::TableInvDT
 Table error; invalid data type. More...
 
class  casacore::TableInvOper
 Table error; invalid operation. More...
 
class  casacore::TableArrayConformanceError
 Table error; non-conformant array. More...
 
class  casacore::TableConformanceError
 Table error; table length conformance error. More...
 
class  casacore::TableInvSort
 Table error; invalid sort. More...
 
class  casacore::TableInvLogic
 Table error; invalid logical operation. More...
 
class  casacore::TableInvExpr
 Table error; invalid select expression. More...
 
class  casacore::TableVectorNonConform
 Table error; non-conformant table vectors. More...
 
class  casacore::TableParseError
 Table error; invalid table command. More...
 
class  casacore::TableGramError
 Table grammar error; invalid table command. More...
 
class  casacore::TableIndexProxy
 Proxy for table index access. More...
 
class  casacore::TableIterator
 Iterate through a Table. More...
 
class  casacore::TableIterProxy
 Proxy for table iterator access. More...
 
class  casacore::TableLocker
 Class to hold a (user) lock on a table. More...
 
class  casacore::TableProxy
 High-level interface to tables. More...
 
class  casacore::TableRecord
 A hierarchical collection of named fields of various types. More...
 
class  casacore::ROTableRow
 Readonly access to a table row. More...
 
class  casacore::TableRow
 Read/write access to a table row. More...
 
class  casacore::TableRowProxy
 Proxy for table row access. More...
 
class  casacore::TableVector< T >
 Templated readonly table column vectors. More...
 
struct  casacore::TabVecMath_global_functions_basicMath
 Basic math for table vectors. More...
 
struct  casacore::TabVecMath_global_functions_basicTransMath
 Transcendental math for table vectors. More...
 
struct  casacore::TabVecMath_global_functions_advTransMath
 Further transcendental math for table vectors. More...
 
struct  casacore::TabVecMath_global_functions_miscellaneous
 Miscellaneous table vector operations. More...
 
struct  casacore::TabVecMath_global_functions_vectorMath
 Vector operations on a table vector. More...
 

Detailed Description

CTDS (Casacore Table Data System) is the data storage mechanism for Casacore.

See below for an overview of the classes in this module.

Intended use:

Public interface

Review Status

Reviewed By:
jhorstko
Date Reviewed:
1994/08/30

Prerequisite

Etymology

"Table" is a formal term from relational database theory: "The organizing principle in a relational database is the TABLE, a rectangular, row/column arrangement of data values." Casacore tables are extensions to traditional tables, but are similar enough that we use the same name. There is also a strong resemblance between the uses of Casacore tables, and FITS binary tables, which provides another reason to use "Tables" to describe the Casacore data storage mechanism.

Synopsis

Tables are the fundamental storage mechanism for Casacore. This document explains why they had to be made, what their properties are, and how to use them. The last subject is discussed and illustrated in a sequence of sections:

A few applications exist to inspect and manipulate a table.

Several UML diagrams describe the class structure of the Tables module.

Motivation

The Casacore tables are mainly based upon the ideas of Allen Farris, as laid out in the AIPS++ Database document, from where the following paragraph is taken:

Traditional relational database tables have two features that decisively limit their applicability to scientific data. First, an item of data in a column of a table must be atomic – it must have no internal structure. A consequence of this restriction is that relational databases are unable to deal with arrays of data items. Second, an item of data in a column of a table must not have any direct or implied linkages to other items of data or data aggregates. This restriction makes it difficult to model complex relationships between collections of data. While these restrictions may make it easy to define a mathematically complete set of data manipulation operations, they are simply intolerable in a scientific data-handling context. Multi-dimensional arrays are frequently the most natural modes in which to discuss and think about scientific data. In addition, scientific data often requires complex calibration operations that must draw on large bodies of data about equipment and its performance in various states. The restrictions imposed by the relational model make it very difficult to deal with complex problems of this nature.

In response to these limitations, and other needs, the Casacore tables were designed.

Table Properties

Casacore tables have the following properties:

Tables can be in one of four forms:

Concurrent access from different processes to the same plain table is fully supported by means of a locking/synchronization mechanism. Concurrent access over NFS is also supported.

A (somewhat primitive) mechanism is available to do a table lookup based on the contents of a key.

Opening an Existing Table

To open an existing table you just create a Table object giving the name of the table, like:

Table readonly_table ("tableName");
// or
Table read_and_write_table ("tableName", Table::Update);

The constructor option determines whether the table will be opened as readonly or as read/write. A readonly table file must be opened as readonly, otherwise an exception is thrown. The functions Table::isWritable(...) can be used to determine if a table is writable.

When the table is opened, the data managers are reinstantiated according to their definition at table creation.

The static function TableUtil::openTable can be used to open a table, in particular a subtable, in a simple way by means of the :: notation like maintable::subtable. The :: notation is much better than specifying an explicit path (such as maintable/subtable, because it also works fine if the main table is a reference table (e.g. the result of a selection).

Reading from a Table

You can read data from a table column with the "get" functions in the classes ScalarColumn<T> and ArrayColumn<T>. For scalars of a standard data type (i.e. Bool, uChar, Int, Short, uShort, uInt, float, double, Complex, DComplex and String) you could instead use TableColumn::getScalar(...) or TableColumn::asXXX(...). These functions offer an extra: they do automatic data type promotion; so that you can, for example, get a double value from a float column.

These "get" functions are used in the same way as the simple "put" functions described in the previous section.

ScalarColumn<T> can be constructed for a non-writable column. However, an exception is thrown if the put function is used for it. The same is true for ArrayColumn<T> and TableColumn.

A typical program could look like:

#include <iostream>
main()
{
// Open the table (readonly).
Table tab ("some.name");
// Construct the various column objects.
// Their data type has to match the data type in the table description.
ScalarColumn<Int> acCol (tab, "ac");
ArrayColumn<Float> arr2Col (tab, "arr2");
// Loop through all rows in the table.
uInt nrrow = tab.nrow();
for (uInt i=0; i<nrow; i++) {
// Read the row for both columns.
cout << "Column ac in row i = " << acCol(i) << endl;
Array<Float> array = arr2Col.get (i);
}
// Show the entire column ac,
// and show the 10th element of arr2 in each row.\.
cout << ac.getColumn();
cout << arr2.getColumn (Slicer(Slice(10)));
}

Creating a Table

The creation of a table is a multi-step process:

  1. Create a table description.
  2. Create a SetupNewTable object with the name of the new table.
  3. Create the necessary data managers.
  4. Bind each column to the appropriate data manager. The system will bind unbound columns to data managers which are created internally using the default data manager name defined in the column description.
  5. Define the shape of direct columns (if that was not already done in the column description).
  6. Create the Table object from the SetupNewTable object. Here, a final check is performed and the necessary files are created.

The recipe above is meant for the creation a plain table, but the creation of a memory table is exactly the same. The only difference is that in call to construct the Table object the Table::Memory type has to be given. Note that in the SetupNewTable object the columns can be bound to any data manager. MemoryTable will rebind stored columns to the MemoryStMan storage manager, but virtual columns bindings are not changed.

The following example shows how you can create a table. An example specifically illustrating the creation of the table description is given in that section. Other sections discuss the access to the table.

#include <casacore/tables/Tables/StandardStMan.h>
#include <casacore/tables/Tables/IncrementalStMan.h>
main()
{
// Step1 -- Build the table description.
TableDesc td("tTableDesc", "1", TableDesc::Scratch);
td.comment() = "A test of class SetupNewTable";
td.addColumn (ScalarColumnDesc<Int> ("ab","Comment for column ab"));
td.addColumn (ScalarColumnDesc<Int> ("ac"));
td.addColumn (ScalarColumnDesc<uInt> ("ad","comment for ad"));
td.addColumn (ScalarColumnDesc<Float> ("ae"));
td.addColumn (ScalarRecordColumnDesc ("arec"));
td.addColumn (ArrayColumnDesc<Float> ("arr1",3,ColumnDesc::Direct));
td.addColumn (ArrayColumnDesc<Float> ("arr2",0));
td.addColumn (ArrayColumnDesc<Float> ("arr3",0,ColumnDesc::Direct));
// Step 2 -- Setup a new table from the description.
SetupNewTable newtab("newtab.data", td, Table::New);
// Step 3 -- Create storage managers for it.
StandardStMan stmanStand_1;
StandardStMan stmanStand_2;
IncrementalStMan stmanIncr;
// Step 4 -- First, bind all columns to the first storage
// manager. Then, bind a few columns to another storage manager
// (which will overwrite the previous bindings).
newtab.bindAll (stmanStand_1);
newtab.bindColumn ("ab", stmanStand_2);
newtab.bindColumn ("ae", stmanIncr);
newtab.bindColumn ("arr3", stmanIncr);
// Step 5 -- Define the shape of the direct columns.
// (this could have been done in the column description).
newtab.setShapeColumn("arr1", IPosition(3,2,3,4));
newtab.setShapeColumn("arr3", IPosition(3,3,4,5));
// Step 6 -- Finally, create the table consisting of 10 rows.
Table tab(newtab, 10);
// Now we can fill the table, which is shown in a next section.
// The Table destructor will flush the table to the files.
}

To create a table in memory, only step 6 has to be modified slightly to:

Table tab(newtab, Table::Memory, 10);

Note that the function TableUtil::createTable can be used to create a table in a simpler way. It can also be used to create a subtable using the :: notation similar to the Tableutil::openTable function described above.

Writing into a Table

Once a table has been created or has been opened for read/write, you want to write data into it. Before doing that you may have to add one or more rows to the table.
Tip: If a table was created with a given number of rows, you do not need to add rows; you may not even be able to do so;

When adding new rows to the table, either via the Table(...) constructor or via the Table::addRow(...) function, you can choose to have those rows initialized with the default values given in the description.

To actually write the data into the table you need the classes ScalarColumn<T> and ArrayColumn<T>. For each column you can construct one or more of these objects. Their put(...) functions let you write a value at a time or the entire column in one go. For arrays you can "put" subsections of the arrays.

As an alternative for scalars of a standard data type (i.e. Bool, uChar, Int, Short, uShort, uInt, float, double, Complex, DComplex and String) you could use the functions TableColumn::putScalar(...). These functions offer an extra: automatic data type promotion; so that you can, for example, put a float value in a double column.

A typical program could look like:

#include <iostream>
main()
{
// First build the table description.
TableDesc td("tTableDesc", "1", TableDesc::Scratch);
td.comment() = "A test of class SetupNewTable";
td.addColumn (ScalarColumnDesc<Int> ("ac"));
td.addColumn (ArrayColumnDesc<Float> ("arr2",0));
// Setup a new table from the description,
// and create the (still empty) table.
// Note that since we do not explicitly bind columns to
// data managers, all columns will be bound to the default
// standard storage manager StandardStMan.
SetupNewTable newtab("newtab.data", td, Table::New);
Table tab(newtab);
// Construct the various column objects.
// Their data type has to match the data type in the description.
ScalarColumn<Int> ac (tab, "ac");
ArrayColumn<Float> arr2 (tab, "arr2");
Vector<Float> vec2(100);
// Write the data into the columns.
// In each cell arr2 will be a vector of length 100.
// Since its shape is not set explicitly, it is done implicitly.
for (uInt i=0; i<10; i++) {
tab.addRow(); // First add a row.
ac.put (i, i+10); // value is i+10 in row i
indgen (vec2, float(i+20)); // vec2 gets i+20, i+21,..., i+119
arr2.put (i, vec2);
}
// Finally, show the entire column ac,
// and show the 10th element of arr2.
cout << ac.getColumn();
cout << arr2.getColumn (Slicer(Slice(10)));
// The Table destructor writes the table.
}

In this example we added rows in the for loop, but we could also have created 10 rows straightaway by constructing the Table object as:

Table tab(newtab, 10);

in which case we would not include

tab.addRow()

The classes TableColumn, ScalarColumn<T>, and ArrayColumn<T> contain several functions to put values into a single cell or into the whole column. This may look confusing, but is actually quite simple. The functions can be divided in two groups:

  1. Put the given value into the column cell(s).

  2. Copy values from another column to this column.
    These functions have the advantage that the data type of the input and/or output column can be unknown. The generic TableColumn objects can be used for this purpose. The put(Column) function checks the data types and, if possible, converts them. If the conversion is not possible, it throws an exception.
    • The put functions copy the value in a cell of the input column to a cell in the output column. The row numbers of the cells in the columns can be different.
    • The putColumn functions copy the entire contents of the input column to the output column. The lengths of the columns must be equal.
    Each class has its own set of these functions.

Accessing rows in a Table

Apart from accessing a table column-wise as described in the previous two sections, it is also possible to access a table row-wise. The TableRow class makes it possible to access multiple fields in a table row as a whole. Note that like the XXColumn classes described above, there is also an ROTableRow class for access to readonly tables.

On construction of a TableRow object it has to be specified which fields (i.e. columns) are part of the row. For these fields a fixed structured TableRecord object is constructed as part of the TableRow object. The TableRow::get function will fill this record with the table data for the given row. The user has access to the record and can use RecordFieldPtr objects for speedier access to the record.

The class could be used as shown in the following example.

// Open the table as readonly and define a row object to contain
// the given columns.
// Note that the function stringToVector is a very convenient
// way to construct a Vector<String>.
// Show the description of the fields in the row.
Table table("Some.table");
ROTableRow row (table, stringToVector("col1,col2,col3"));
cout << row.record().description();
// Since the structure of the record is known, the RecordFieldPtr
// objects could be used to allow for easy and fast access to
// the record which is refilled for each get.
RORecordFieldPtr<String> col1(row.record(), "col1");
RORecordFieldPtr<Double> col2(row.record(), "col2");
RORecordFieldPtr<Array<Int> > col3(row.record(), "col3");
for (uInt i=0; i<table.nrow(); i++) {
row.get (i);
someString = *col1;
somedouble = *col2;
someArrayInt = *col3;
}

The description of TableRow contains some more extensive examples.

Table Selection and Sorting

The result of a select and sort of a table is another table, which references the original table. This means that an update of a sorted or selected table results in the update of the original table. The result is, however, a table in itself, so all table functions (including select and sort) can be used with it. Note that a true copy of such a reference table can be made with the Table::deepCopy function.

Rows or columns can be selected from a table. Columns can be selected by the Table::project(...) function, while rows can be selected by the various Table operator() functions. Usually a row is selected by giving a select expression with TableExprNode objects. These objects represent the various nodes in an expression, e.g. a constant, a column, or a subexpression. The Table function Table::col(...) creates a TableExprNode object for a column. The function Table::key(...) does the same for a keyword by reading the keyword value and storing it as a constant in an expression node. All column nodes in an expression must belong to the same table, otherwise an exception is thrown. In the following example we select all rows with RA>10:

#include <casacore/tables/Tables/ExprNode.h>
Table table ("Table.name");
Table result = table (table.col("RA") > 10);

while in the next one we select rows with RA and DEC in the given intervals:

Table result = table (table.col("RA") > 10
&& table.col("RA") < 14
&& table.col("DEC") >= -10
&& table.col("DEC") <= 10);

The following operators can be used to form arbitrarily complex expressions:

Many functions (like sin, max, conj) can be used in an expression. Class TableExprNode shows the available functions. E.g.

Table result = table (sin (table.col("RA")) > 0.5);

Function in can be used to select from a set of values. A value set can be constructed using class TableExprNodeSet.

TableExprNodeSet set;
set.add (TableExprNodeSetElem ("abc"));
set.add (TableExprNodeSetElem ("defg"));
set.add (TableExprNodeSetElem ("h"));
Table result = table (table.col("NAME).in (set));

select rows with a NAME equal to abc, defg, or h.

You can sort a table on one or more columns containing scalars. In this example we simply sort on column RA (default is ascending):

Table table ("Table.name");
Table result = table.sort ("RA");

Multiple Table::sort(...) functions exist which allow for more flexible control over the sort order. In the next example we sort first on RA in descending order and then on DEC in ascending order:

Table table ("Table.name");
Block<String> sortKeys(2);
Block<int> sortOrders(2);
sortKeys(0) = "RA";
sortOrders(0) = Sort::Descending;
sortKeys(1) = "DEC";
sortOrders(1) = Sort::Ascending;
Table result = table.sort (sortKeys, sortOrders);

Tables stemming from the same root, can be combined in several ways with the help of the various logical Table operators (operator|, etc.).

Table Query Language

The selection and sorting mechanism described above can only be used in a hard-coded way in a C++ program. There is, however, another way. Strings containing selection and sorting commands can be used. The syntax of these commands is based on SQL and is described in the Table Query Language (TaQL) note 199. The language supports UDFs (User Defined Functions) in dynamically loadable libraries as explained in the note.
A TaQL command can be executed with the static function tableCommand defined in class TableParse.

Table Concatenation

Tables with identical descriptions can be concatenated in a virtual way using the Table concatenation constructor. Such a Table object behaves as any other Table object, thus any operation can be performed on it. An identical description means that the number of columns, the column names, and their data types of the columns must be the same. The columns do not need to be ordered in the same way nor to be stored in the same way.
Note that if tables have different column names, it is possible to form a projection (as described in the previous section) first to make them appear identical.

Sometimes a MeasurementSet is partitioned, for instance in chunks of one hour. All those chunks can be virtually concatenated this way. Note that all tables in the concatenation will be opened, thus one might run out of file descriptors if there are many chunks.

Similar to reference tables, it is possible to make a concatenated Table persistent by using the rename function. It will not copy the data; only the names of the tables used are written.

The keywords of a concatenated table are taken from the first table. It is possible to change or add keywords, but that is not persistent, not even if the concatenated table is made persistent.
The keywords holding subtables can be handled in a special way. Normally the subtables of the concatenation are the subtables of the first table are used, but is it possible to concatenate subtables as well by giving their names in the constructor. In this way the, say, SYSCAL subtable of a MeasurementSet can be concatenated as well.

// Create virtual concatenation of ms0 and ms1.
Block<String> names(2);
names[0] = "ms0";
names[1] = "ms1";
// Also concatenate their SYSCAL subtables.
Block<String> subNames(1, "SYSCAL");
Table concTab (names, subNames);

Table Iterators

You can iterate through a table in an arbitrary order by getting a subset of the table consisting of the rows in which the iteration columns have the same value. An iterator object is created by constructing a TableIterator object with the appropriate column names.

In the next example we define an iteration on the columns Time and Baseline. Each iteration step returns a table subset in which Time and Baseline have the same value.

// Iterate over Time and Baseline (by default in ascending order).
// Time is the main iteration order, thus the first column specified.
Table t;
Table tab ("UV_Table.data");
Block<String> iv0(2);
iv0[0] = "Time";
iv0[1] = "Baseline";
//
// Create the iterator. This will prepare the first subtable.
TableIterator iter(tab, iv0);
Int nr = 0;
while (!iter.pastEnd()) {
// Get the first subtable.
// This will contain rows with equal Time and Baseline.
t = iter.table();
cout << t.nrow() << " ";
nr++;
// Prepare the next subtable with the next Time,Baseline value.
iter.next();
}
cout << endl << nr << " iteration steps" << endl;

You can define more than one iterator on the same table; they operate independently.

Note that the result of each iteration step is a table in itself which references the original table, just as in the case of a sort or select. This means that the resulting table can be used again in a sort, select, iteration, etc..

Table Vectors

A table vector makes it possible to treat a column in a table as a vector. Almost all operators and functions defined for normal vectors, are also defined for table vectors. So it is, for instance, possible to add a constant to a table vector. This has the effect that the underlying column gets changed.

You can use the templated class TableVector to make a scalar column appear as a (table) vector. Columns containing arrays or tables are not supported. The data type of the TableVector object must match the data type of the column. A table vector can also hold a normal vector so that (temporary) results of table vector operations can be handled.

In the following example we double the data in column COL1 and store the result in a temporary table vector.

// Create a table vector for column COL1.
// Note that if the table is readonly, putting data in the table vector
// results in an exception.
Table tab ("Table.data");
TableVector<Int> tabvec(tab, "COL1");
// Multiply it by a constant. Result is kept in a Vector in memory.
TableVector<Int> temp = 2 * tabvec;

In the next example we double the data in COL1 and put the result back in the column.

// Create a table vector for column COL1.
// It has to be a TableVector to be able to change the column.
Table tab ("Table.data", Table::Update);
TableVector<Int> tabvec(tab, "COL1");
// Multiply it by a constant.
tabvec *= 2;

Table Keywords

Any number of keyword/value pairs may be attached to the table as a whole, or to any individual column. They may be freely added, retrieved, re-assigned, or deleted. They are, in essence, a self-resizing list of values (any of the primitive types) indexed by Strings (the keyword).

A table keyword/value pair might be

Observer = Grote Reber
Date = 10 october 1942

Column keyword/value pairs might be

Units = mJy
Reference Pixel = 320

The class TableRecord represents the keywords in a table. It is (indirectly) derived from the standard record classes in the class Record

Table Description

A table contains a description of itself, which defines the layout of the columns and the keyword sets for the table and for the individual columns. It may also define initial keyword sets and default values for the columns. Such a default value is automatically stored in a cell in the table column, whenever a row is added to the table.

The creation of the table descriptor is the first step in the creation of a new table. The description is part of the table itself, but may also exist in a separate file. This is useful if you need to create a number of tables with the same structure; in other circumstances it probably should be avoided.

The public classes to set up a table description are:

Here follows a typical example of the construction of a table description. For more specialized things – like the definition of a default data manager – we refer to the descriptions of the above mentioned classes.

#include <casacore/tables/Tables/ScaRecordTabDesc.h>
main()
{
// Create a new table description
// Define a comment for the table description.
// Define some keywords.
ColumnDesc colDesc1, colDesc2;
TableDesc td("tTableDesc", "1", TableDesc::New);
td.comment() = "A test of class TableDesc";
td.rwKeywordSet().define ("ra" float(3.14));
td.rwKeywordSet().define ("equinox", double(1950));
td.rwKeywordSet().define ("aa", Int(1));
// Define an integer column ab.
td.addColumn (ScalarColumnDesc<Int> ("ab", "Comment for column ab"));
// Add a scalar integer column ac, define keywords for it
// and define a default value 0.
// Overwrite the value of keyword unit.
ScalarColumnDesc<Int> acColumn("ac");
acColumn.rwKeywordSet().define ("scale" Complex(0,0));
acColumn.rwKeywordSet().define ("unit", "");
acColumn.setDefault (0);
td.addColumn (acColumn);
td.rwColumnDesc("ac").rwKeywordSet().define ("unit", "DEG");
// Add a scalar string column ad and define its comment string.
td.addColumn (ScalarColumnDesc<String> ("ad","comment for ad"));
// Now define array columns.
// This one is indirect and has no dimensionality mentioned yet.
td.addColumn (ArrayColumnDesc<Complex> ("Arr1","comment for Arr1"));
// This one is indirect and has 3-dim arrays.
td.addColumn (ArrayColumnDesc<Int> ("A2r1","comment for Arr1",3));
// This one is direct and has 2-dim arrays with axes length 4 and 7.
td.addColumn (ArrayColumnDesc<uInt> ("Arr3","comment for Arr1",
IPosition(2,4,7),
// Add columns containing records.
td.addColumn (ScalarRecordColumnDesc ("Rec1"));
}

Data Managers

Data managers take care of the actual access to the data in a column. There are two kinds of data managers:

  1. Storage managers – which store the data as such. They can only handle the standard data types (Bool,...,String) as discussed in the section about the table properties).
  2. Virtual column engines – which manipulate the data. An engine could be a simple thing like scaling the data (as done in classic AIPS to reduce data storage), but it could also be an elaborate thing like applying corrections on-the-fly.
    A special engine is VirtualTaQLColumn which can be used to define the contents of a column by means of a TaQL expression. In particular, it can be used to define a constant value for the entire column. But it can also be used to calculate the UVW-coordinates on-the-fly.
    An engine must be used when storing data objects with a non-standard type. It has to break down the object into items with standard data types which can be stored with a storage manager.

In general the user of a table does not need to be aware which data managers are being used underneath. Only when the table is created data managers have to be bound to the columns. Thereafter it is completely transparent.

Data managers needs to be registered, so they can be found when a table is opened. All data managers mentioned below are part of the system and pre-registered. It is, however, also possible to load data managers on demand. If a data manager is not registered it is tried to load a shared library with the part of the data manager name (in lowercase) before a dot or left arrow. The dot makes it possible to have multiple data managers in a shared library, while the left arrow is meant for templated data manager classes.
E.g. if BitFlagsEngine<uChar> was not registered, the shared library libbitflagsengine.so (or.dylib) will be loaded. If successful, its function register_bitflagsengine() will be executed which should register the data manager(s). Thereafter it is known and will be used. For example in a file Register.h and Register.cc:

// Declare in.h file as C function, so no name mangling is done.
extern "C" {
void register_bitflagsengine();
}
// Implement in.cc file.
void register_bitflagsengine()
{
}

There are several functions that can give information which data managers are used for which columns and to obtain the characteristics and properties of them. Class RODataManAccessor and derived classes can be used for it as well as the functions dataManagerInfo and showStructure in class Table.

Storage Managers

Storage managers are used to store the data contained in the column cells. At table construction time the binding of columns to storage managers is done.
Each storage manager uses one or more files (usually called table.fi_xxx where i is a sequence number and _xxx is some kind of extension). Typically several file are used to store the data of the columns of a table.
In order to reduce the number of files (and to support large block sizes), it is possible to have a single container file (a MultiFile) containing all data files used by the storage managers. Such a file is called table.mf. Note that the program lsmf can be used to see which files are contained in a MultiFile. The program tomf can convert the files in a MultiFile to regular files.
At table creation time it is decided if a MultiFile will be used. It can be done by means of the StorageOption object given to the SetupNewTable constructor and/or by the aipsrc variables:

About all standard storage managers support the MultiFile. The exception is StManAipsIO, because it is hardly ever used.

Several storage managers exist, each with its own storage characteristics. The default and preferred storage manager is StandardStMan. Other storage managers should only be used if they pay off in file space (like IncrementalStMan for slowly varying data) or access speed (like the tiled storage managers for large data arrays).
The storage managers store the data in a big or little endian canonical format. The format can be specified when the table is created. By default it uses the endian format as specified in the aipsrc variable table.endianformat which can have the value local, big, or little. The default is local.

  1. StandardStMan stores all the values in so-called buckets (equally sized chunks in the file). It requires little memory.
    It replaces the old StManAipsIO.

  2. IncrementalStMan uses a storage mechanism resembling "incremental backups". A value is only stored if it is different from the previous row. It is very well suited for slowly varying data.
    The class ROIncrementalStManAccessor can be used to tune the behaviour of the IncrementalStMan. It contains functions to deal with the cache size and to show the behaviour of the cache.

  3. The Tiled Storage Managers store the data as a tiled hypercube allowing for more or less equally efficient data access along all main axes. It can be used for UV-data as well as for image data.

  4. StManAipsIO uses AipsIO to store the data in the columns. It supports all table functionality, but its I/O is probably not as efficient as other storage managers. It also requires that a large part of the table fits in memory.
    It should not be used anymore, because it uses a lot of memory for larger tables and because it is not very robust in case an application or system crashes.

  5. MemoryStMan holds the data in memory. It means that data 'stored' with this storage manager are NOT persistent.
    This storage manager is primarily meant for tables held in memory, but it can also be useful for temporary columns in normal tables. Note, however, that if a table is accessed concurrently from multiple processes, MemoryStMan data cannot be synchronized.

  6. dyscostman::DyscoStMan is a class that stores data with lossy compression. It combines non-linear least-squares quantization and different kinds of normalizaton. With the typical factor of 4 compression, the loss in accuracy from lossy compression is negligable. It should only be used for real (non-simulated) data that is in a Measurement Set. The method is described in this article: https://arxiv.org/abs/1609.02019.

  7. Adios2StMan uses the ADIOS2 framework to store and load column data.
    ADIOS2 has several configurable storage backend itself, and this flexibility is also available via Adios2StMan. This includes, among other things, storing compressed data, or choosing a different on-disk formats.
    This storage manager is also special in that it provides parallel writing capabilities for MPI processes, so that multiple processes can write into different sections of the same column concurrently.

The storage manager framework makes it possible to support arbitrary files as tables. This has been used in a case where a file is filled by the data acquisition system of a telescope. The file is simultaneously used as a table using a dedicated storage manager. The table system and storage manager provide a sync function to synchronize the processes, i.e. to make CTDS aware of changes in the file size (thus in the table size) by the filling process.


Tip: Not all data managers support all the table functionality; So, the choice of a data manager can greatly influence the type of operations you can do on the table as a whole; For example, if a column uses the tiled storage manager, it is not possible to delete rows from the table, because that storage manager will not support deletion of rows; However, it is always possible to delete all columns of a data manager in one single call;

Tiled Storage Manager

The Tiled Storage Managers allow one to store the data of one or more columns in a tiled way. Tiling means that the data are stored without a preferred order to make access along the different main axes equally efficient. This is done by storing the data in so-called tiles (i.e. equally shaped subsets of an array) to increase data locality. The user can define the tile shape to optimize for the most frequently used access.

The Tiled Storage Manager has the following properties:

The following Tiled Storage Managers are available:

TiledShapeStMan
can be seen as a specialization of TiledDataStMan by using the array shape as the id value. Similarly to TiledDataStMan it can maintain multiple hypercubes and store multiple rows in a hypercube, but it is easier to use, because the special addHypercube and extendHypercube functions are not needed. An hypercube is automatically added when a new array shape is encountered.
This storage manager could be used for a table with a column containing line and continuum data, which will result in 2 hypercubes.
TiledCellStMan
creates (automatically) a new hypercube for each row. Thus each row of the hypercolumn is stored in a separate hypercube. Note that the row number serves as the id value. So an id column is not needed, although there are multiple hypercubes.
This storage manager is meant for tables where the data arrays in the different rows are not accessed together. One can think of a column containing images. Each row contains an image and only one image is shown at a time.
TiledColumnStMan
creates one hypercube for the entire hypercolumn. Thus all cells in the hypercube have to have the same shape and therefore this storage manager is only possible if all columns in the hypercolumn have the attribute FixedShape.
This storage manager could be used for a table with a column containing images for the Stokes parameters I, Q, U, and V. By storing them in one hypercube, it is possible to retrieve the 4 Stokes values for a subset of the image or for an individual pixel in a very efficient way.
TiledDataStMan

allows one to control the creation and extension of hypercubes. This is done by means of the class

TiledDataStManAccessor. It makes it possible to store, say, row 0-9 in hypercube A, row 10-34 in hypercube B, row 35-54 in hypercube A again, etc..
The drawback of this storage manager is that its hypercubes are not automatically extended when adding new rows. The special functions addHypercube and extendHypercube have to be used making it somewhat tedious to use. Therefore this storage manager may become obsolete in the near future.

The Tiled Storage Managers have 3 ways to access and cache the data. Class TSMOption can be used to setup an access choice and use it in a Table constructor.

Apart from reading, all access ways described above can also handle writing and extending tables. They create fully equal files. Both little and big endian data can be read or written.

Virtual Column Engines

Virtual column engines are used to implement the virtual (i.e. calculated-on-the-fly) columns. CTDS provides an abstract base class (or "interface class") VirtualColumnEngine that specifies the protocol for these engines. The programmer must derive a concrete class to implement the application-specific virtual column.

For example: the programmer needs a column in a table which is the difference between two other columns. (Perhaps these two other columns are updated periodically during the execution of a program.) A good way to handle this would be to have a virtual column in the table, and write a virtual column engine which knows how to calculate the difference between corresponding cells of the two other columns. So the result is that accessing a particular cell of the virtual column invokes the virtual column engine, which then gets the values from the other two columns, and returns their difference. This particular example could be done using VirtualTaQLColumn.

Several virtual column engines exist:

  1. The class VirtualTaQLColumn makes it possible to define a column as an arbitrary expression of other columns. It uses the TaQL CALC command. The virtual column can be a scalar or an array and can have one of the standard data types supported by CTDS.
  2. The class BitFlagsEngine maps an integer bit flags column to a Bool column. A read and write mask can be defined telling which bits to take into account when mapping to and from Bool (thus when reading or writing the Bool).
  3. The class CompressFloat compresses a single precision floating point array by scaling the values to shorts (16-bit integer).
  4. The class CompressComplex compresses a single precision complex array by scaling the values to shorts (16-bit integer). In fact, the 2 parts of the complex number are combined to an 32-bit integer.
  5. The class CompressComplexSD does the same as CompressComplex, but optimizes for the case where the imaginary part is zero (which is often the case for Single Dish data).
  6. The double templated class ScaledArrayEngine scales the data in an array from, for example, float to short before putting it.
  7. The double templated class MappedArrayEngine converts the data from one data type to another. Sometimes it might be needed to store the residual data in an MS in double precision. Because the imaging task can only handle single precision, this enigne can be used to map the data from double to single precision.
  8. The double templated class RetypedArrayEngine converts the data from one data type to another with the possibility to reduce the number of dimensions. For example, it can be used to store an 2-d array of StokesVector objects as a 3-d array of floats by treating the 4 data elements as an extra array axis. If the StokesVector class is simple, it can be done very efficiently.
  9. The class

    ForwardColumnEngine forwards the gets and puts on a row in a column to the same row in a column with the same name in another table. This provides a virtual copy of the referenced column.

  10. The class

    ForwardColumnIndexedRowEngine is similar to ForwardColumnEngine.. However, instead of forwarding it to the same row it uses a a column to map its row number to a row number in the referenced table. In this way multiple rows can share the same data. This data manager only allows for get operations.

  11. The calibration module has implemented a virtual column engine to do on-the-fly calibration in a transparent way.

To handle arbitrary data types the templated abstract base class VSCEngine has been written. An example of how to use this class can be found in the demo program dVSCEngine.cc.

Table locking and synchronization

Multiple concurrent readers and writers (also via NFS) of a table are supported by means of a locking/synchronization mechanism. This mechanism is not very sophisticated in the sense that it is very coarsely grained. When locking, the entire table gets locked. A special lock file is used to lock the table. This lock file also contains some synchronization data.

Five ways of locking are supported (see class TableLock):

TableLock::PermanentLocking(Wait)
locks the table permanently (from open till close). This means that one writer OR multiple readers are possible.
TableLock::AutoLocking
does the locking automatically. This is the default mode. This mode makes it possible that a table is shared amongst processes without the user needing to write any special code. It also means that a lock is only released when needed.
TableLock::AutoNoReadLocking
is similar to AutoLocking. However, no lock is acquired when reading the table making it possible to read the table while another process holds a write-lock. It also means that for read purposes no automatic synchronization is done when the table is updated in another process. Explicit synchronization can be done by means of the function Table::resync.
TableLock::UserLocking
requires that the programmer explicitly acquires and releases a lock on the table. This makes some kind of transaction processing possible. E.g. set a write lock, add a row, write all data into the row and release the lock. The Table functions lock and unlock have to be used to acquire and release a (read or write) lock.
TableLock::UserNoReadLocking
is similar to UserLocking. However, similarly to AutoNoReadLocking no lock is needed to read the table.
TableLock::NoLocking
does not use table locking. It is the responsibility of the user to ensure that no concurrent access is done on the same bucket or tile in a storage manager, otherwise a table might get corrupted.
This mode is always used if Casacore is built with -DAIPS_TABLE_NOLOCKING.

Synchronization of the processes accessing the same table is done by means of the lock file. When a lock is released, the storage managers flush their data into the table files. Some synchronization data is written into the lock file telling the new number of table rows and telling which storage managers have written data. This information is read when another process acquires the lock and is used to determine which storage managers have to refresh their internal caches.
Note that for the NoReadLocking modes (see above) explicit synchronization might be needed using Table::resync.

The function Table::hasDataChanged can be used to check if a table is (being) changed by another process. In this way a program can react on it. E.g. the table browser can refresh its screen when the underlying table is changed.

In general the default locking option will do. From the above it should be clear that heavy concurrent access results in a lot of flushing, thus will have a negative impact on performance. If uninterrupted access to a table is needed, the PermanentLocking option should be used. If transaction-like processing is done (e.g. updating a table containing an observation catalogue), the UserLocking option is probably best.

Creation or deletion of a table is not possible if that table is still open in another process. The function Table::isMultiUsed() can be used to check if a table is open in other processes.
The function TableUtil::deleteTable should be used to delete a table. Before deleting the table it ensures that it is writable and that it is not open in the current or another process.

The following example wants to read the table uninterrupted, thus it uses the PermanentLocking option. It also wants to wait until the lock is actually acquired. Note that the destructor closes the table and releases the lock.

// Open the table (readonly).
// Acquire a permanent (read) lock.
// It waits until the lock is acquired.
Table tab ("some.name",

The following example uses the automatic locking.. It tells the system to check about every 20 seconds if another process wants access to the table.

// Open the table (readonly).
Table tab ("some.name",
TableLock(TableLock::AutoLocking, 20));

The following example gets data (say from a GUI) and writes it as a row into the table. The lock the table as little as possible the lock is acquired just before writing and released immediately thereafter.

// Open the table (writable).
Table tab ("some.name",
while (True) {
get input data
tab.lock(); // Acquire a write lock and wait for it.
tab.addRow();
write data into the row
tab.unlock(); // Release the lock.
}

The following example deletes a table if it is not used in another process.

Table tab ("some.name");
if (! tab.isMultiUsed()) {
tab.markForDelete();
}

Table lookup based on a key

Class ColumnsIndex offers the user a means to find the rows matching a given key or key range. It is a somewhat primitive replacement of a B-tree index and in the future it may be replaced by a proper B+-tree implementation.

The ColumnsIndex class makes it possible to build an in-core index on one or more columns. Looking a key or key range is done using a binary search on that index. It returns a vector containing the row numbers of the rows matching the key (range).

The class is not capable of tracing changes in the underlying column(s). It detects a change in the number of rows and updates the index accordingly. However, it has to be told explicitly when a value in the underlying column(s) changes.

The following example shows how the class can be used.

Example

Suppose one has an antenna table with key ANTENNA.

// Open the table and make an index for column ANTENNA.
Table tab("antenna.tab")
ColumnsIndex colInx(tab, "ANTENNA");
// Make a RecordFieldPtr for the ANTENNA field in the index key record.
// Its data type has to match the data type of the column.
RecordFieldPtr<Int> antFld(colInx.accessKey(), "ANTENNA");
// Now loop in some way and find the row for the antenna
// involved in that loop.
Bool found;
while (...) {
// Fill the key field and get the row number.
// ANTENNA is a unique key, so only one row number matches.
// Otherwise function getRowNumbers had to be used.
*antFld = antenna;
uInt antRownr = colInx.getRowNumber (found);
if (!found) {
cout << "Antenna " << antenna << " is unknown" << endl;
} else {
// antRownr can now be used to get data from that row in
// the antenna table.
}
}

ColumnsIndex itself contains a more advanced example. It shows how to use a private compare function to adjust the lookup if the index does not contain single key values, but intervals instead. This is useful if a row in a (sub)table is valid for, say, a time range instead of a single timestamp.

Performance and robustness considerations

CTDS resembles a database system, but it is not as robust. It lacks the transaction and logging facilities common to data base systems. It means that in case of a crash data might be lost. To reduce the risk of data loss to a minimum, it is advisable to regularly do a flush, optionally with an fsync to ensure that all data are really written. However, that can degrade the performance because it involves extra writes. So one should find the right balance between robustness and performance.

To get a good feeling for the performance issues, it is important to understand some of the internals of CTDS.
The storage managers drive the performance. All storage managers use buckets (called tiles for the TiledStMan) which contain the data. All IO is done by bucket. The bucket/tile size is defined when creating the storage manager objects. Sometimes the default will do, but usually it is better to set it explicitly.

It is best to do a flush when a tile is full. For example:
When creating a MeasurementSet containing N antennae (thus N*(N-1) baselines or N*(N+1) if auto-correlations are stored as well) it makes sense to store, say, N/2 rows in a tile and do a flush each time all baselines are written. In that way tiles are fully filled when doing the flush, so no extra IO is involved.
Here is some code showing this when creating a MeasurementSet. The code should speak for itself.

MS* createMS (const String& msName, int nrchan, int nrant)
{
// Get the MS main default table description.
TableDesc td = MS::requiredTableDesc();
// Add the data column and its unit.
td.rwColumnDesc(MS::columnName(MS::DATA)).rwKeywordSet().
define("UNIT","Jy");
// Store the DATA and FLAG column in two separate files.
// In this way accessing FLAG only is much cheaper than
// when combining DATA and FLAG.
// All data have the same shape, thus use TiledColumnStMan.
// Also store UVW with TiledColumnStMan.
Vector<String> tsmNames(1);
tsmNames[0] = MS::columnName(MS::DATA);
td.rwColumnDesc(tsmNames[0]).setShape (IPosition(2,itsNrCorr,itsNrFreq));
td.defineHypercolumn("TiledData", 3, tsmNames);
tsmNames[0] = MS::columnName(MS::FLAG);
td.rwColumnDesc(tsmNames[0]).setShape (IPosition(2,itsNrCorr,itsNrFreq));
td.defineHypercolumn("TiledFlag", 3, tsmNames);
tsmNames[0] = MS::columnName(MS::UVW);
td.defineHypercolumn("TiledUVW", 2, tsmNames);
// Setup the new table.
SetupNewTable newTab(msName, td, Table::New);
// Most columns vary slowly and use the IncrStMan.
IncrementalStMan incrStMan("ISMData");
// A few columns use he StandardStMan (set an appropriate bucket size).
StandardStMan stanStMan("SSMData", 32768);
// Store all pol and freq and some rows in a single tile.
// autocorrelations are written, thus in total there are
// nrant*(nrant+1)/2 baselines. Ensure a baseline takes up an
// integer number of tiles.
TiledColumnStMan tiledData("TiledData",
IPosition(3,4,nchan,(nrant+1)/2));
TiledColumnStMan tiledFlag("TiledFlag",
IPosition(3,4,nchan,8*(nrant+1)/2));
TiledColumnStMan tiledUVW("TiledUVW", IPosition(2,3,));
IPosition(2,3,nrant*(nrant+1)/2));
newTab.bindAll (incrStMan);
newTab.bindColumn(MS::columnName(MS::ANTENNA1),stanStMan);
newTab.bindColumn(MS::columnName(MS::ANTENNA2),stanStMan);
newTab.bindColumn(MS::columnName(MS::DATA),tiledData);
newTab.bindColumn(MS::columnName(MS::FLAG),tiledFlag);
newTab.bindColumn(MS::columnName(MS::UVW),tiledUVW);
// Create the MS and its subtables.
// Get access to its columns.
MS* msp = new MeasurementSet(newTab);
// Create all subtables.
// Do this after the creation of optional subtables,
// so the MS will know about those optional sutables.
msp->createDefaultSubtables (Table::New);
return msp;
}

Some more performance considerations

Which storage managers to use and how to use them depends heavily on the type of data and the access patterns to the data. Here follow some guidelines:

  1. Scalar data can be stored with the StandardStMan (SSM) or IncrementalStMan (ISM). For slowly varying data (e.g. the TIME column in a MeasurementSet) it is best to use the ISM. Otherwise the SSM. Note that very long strings (longer than the bucketsize) can only be stored with the SSM.
  2. Any number of storage managers can be used. In fact, each column can have a storage manager of its own resulting in column-wise stored data which is more and more used in data base systems. In that way a query or sort on that column is very fast, because the buckets to read only contain data of that column. In practice one can decide to combine a few frequently used columns in a storage manager.
  3. Array data can be stored with any column manager. Small fixed size arrays can be stored directly with the SSM (or ISM if not changing much). However, they can also be stored with a TiledStMan (TSM) as shown for the UVW column in the example above.
    Large arrays should usually be stored with a TSM. However, if it must be possible to change the shape of an array after it was stored, the SSM (or ISM) must be used. Note that in that case a lot of disk space can be wasted, because the SSM and ISM store the array data at the end of the file if the array got bigger and do not reuse the old space. The only way to reclaim it is by making a deep copy of the entire table.
  4. If an array is stored with a TSM, it is important to decide which TSM to use.
    1. The TiledColumnStMan is the most efficient, but only suitable for arrays having the same shape in the entire column.
    2. The TiledShapeStMan is suitable for columns where the arrays can have a few shapes.
    3. The TiledCellStMan is suitable for columns where the arrays can have many different shapes.
    This is discussed in more detail above.
  5. If storing an array with a TSM, it can be very important to choose the right tile shape. Not only does this define the size of a tile, but it also defines if access in other directions than the natural direction can be fast. It is also discussed in more detail above.
  6. Columns can be combined in a single TiledStMan. For instance, combining DATA and FLAG is advantageous if FLAG is always used with DATA. However, if FLAG is used on its own (e.g. in combination with CORRECTED_DATA), it is better to separate them, otherwise tiles containing FLAG also contain DATA making the tiles much bigger, thus more expensive to access.

IO Tracing

Several forms of tracing can be done to see how the Table I/O performs.

Applications to inspect/manipulate a table