NOTE 260 – CTDS File Formats

Ger van Diepen, ASTRON Dwingeloo

November 10, 2017

Abstract

The Casacore Table Data System (CTDS) uses several files to store data in. Each storage manager has its own files. The document describes the formats of all files that can be used by CTDS.

A pdf version of this note is available.

Contents

1 Introduction
2 AipsIO
 2.1 Array object
 2.2 IPosition object
 2.3 Block and std::vector object
 2.4 std::map object
 2.5 TableRecord object
3 PlainTable
 3.1 General file table.dat
  3.1.1 Table Description
  3.1.2 ColumnSet
 3.2 Info file table.info
 3.3 Lock file table.lock
 3.4 Storage Managers
  3.4.1 StandardStMan
  3.4.2 IncrementalStMan
  3.4.3 TiledStMan
  3.4.4 AipsIOStMan
  3.4.5 Indirect Array File
 3.5 Virtual Column Engines
  3.5.1 ForwardColumnEngine
  3.5.2 ScaledComplex
  3.5.3 VirtualTaQLColumn
4 RefTable
5 ConcatTable
6 MultiFile Format
 6.1 MultiFile
  6.1.1 MultiFile version 1
  6.1.2 MultiFile version 2
 6.2 MultiHDF5

1 Introduction

The Casacore Table Data System (CTDS) is an RDBMS-like system to store data in tables consisting of a number of columns and rows. The table and each column can have an associated set of keywords to define global table data (such as subtable names) or column specific data (such as units). The keywords are meant for small amounts of data. The bulk data will be stored in the column-row cells.

Besides the usual scalar data, a table column or keyword can hold N-dimensional arrays. The data type of a keyword and scalar or array column can be:

Besides these basic types it is also possible to store (Table)Record objects which are basically maps of name strings to values of one of the data types mentioned above. The records can be nested arbitrarily deeply. When stored in a column, a record is serialized (using AipsIO) to a uChar array and stored as such.

There is a strict distinction between the logical and physical model of the data columns. The table description defines the logical model, while data managers (storage managers and virtual data managers) implement the physical model.

On disk a CTDS table is a directory containing a number of files to hold data and meta data. The formats of these files vary and can have quite complicated structures.
A table can have a zero or more subtables that are usually stored as a subdirectory of the main table. A good example of such a table is the Casacore MeasurementSet (see note 229.

A CTDS table can be a reference to one or more other tables which is comparable to a view in a relational data base. CTDS supports a few types of tables.

On disk all table types consist of a directory with the name of the table and several files. The following two files are always present. A PlainTable can contain several more files.

This document describes the format of all files that are part of the core CTDS. Note that CTDS can also use arbitrary storage managers dynamically loaded from a shared library. The file formats of such third party storage managers are not described in this document.

CTDS is designed such that it is fully backward compatible. Version information make it possible that even tables from the very first days can be read back despite the fact that the format has changed considerably.
CTDS understands both little and big endian data representations. At table creation time it is decided which one to use which normally is the local endian type.

2 AipsIO

CTDS makes heavily use of the AipsIO class to store its meta data. Therefore a brief description of the AipsIO format is given.
AipsIO is basically a mechanism to serialize a C++ object into a stream of bytes and to read it back. The format consists of a header followed by the object’s data. The objects can be nested arbitrary deeply. The maximum total size is 4 GBytes.
The header contains the following fields:



Data TypeDescription




uInt Object length (including the header)
String Object type
uInt Version


The version can be used to make software fully backward compatible in case an object changes over time. This feature is used a lot by CTDS to be able to access tables created by older CTDS versions.
The object data are stored after the header as a stream of bytes. The data are unaligned in the stream, thus, say, a float does not need to start at a multiple of 4 bytes.
The length of an object is the total length; it includes the length of the header and the length of possibly nested objects. Of course, each nested object has its own header.
The following native data types are supported. Other data types (classes) can be supported by writing the appropriate shift functions (similar to the C++ std::iostream). A few such objects used by CTDS are described in subsequent sections.



Bool

1 byte

Char and uChar

1 byte

Short and uShort

2 bytes

Int and uInt

4 bytes

Int64 and uInt64

8 bytes

Float

4 bytes IEEE

Double

8 bytes IEEE

Complex

2 Floats

DComplex

2 Doubles

String

uInt length followed by its characters (can be length 0)

C-array

uInt length followed by its elements (can be length 0).

A bool C-array is compressed to bits.



AipsIO can store the data in little or big endian order. In fact, it can also handle the old VAX/VMS and IBM/360 data formats.
CTDS only uses the canonical AipsIO format (which is big endian).

2.1 Array object

Templated Array <T > objects are used quite heavily in Casacore. Only arrays with a size up to 2**31 bytes can be stored in AipsIO. An Array object is stored as follows.



header

Array (version 1, 2, or 3)

uInt

number of dimensions

uInt[ndim]

ndim shape values

Int[ndim]

origin (only present for version < 3); it is ignored

T[nelem]

all array values. Bools are compressed to bits.



2.2 IPosition object

An IPosition object defines a shape of an array or a location in an array. It is stored in AipsIO as follows.



header

IPosition (version 1 or 2)

uInt

number of dimensions

Int[ndim]

shape (for version 1)

Int64[ndim]

shape (for version 2)



Until 2009 a shape element was represented as a 32-bit integer, thereafter as a 64-bit integer. To be as forward compatible as possible, an IPosition is still stored as version 1 if all elements in the shape have a value fitting in a 32-bit integer.

2.3 Block and std::vector object

The templated Casacore Block class is basically the same as the std::vector class. Both are stored in the same way in AipsIO, thus a Block can be read back as a std::vector and vice versa.



header

Block (version 1)

uInt

number of elements

T[n]

the elements



Note that the number of elements is written as an unsigned 32 bit number. This is sufficient since an AipsIO object cannot exceed 4 GB.

2.4 std::map object

Casacore had its own SimpleOrderedMap class before the C++ standard library with its std::map class was developed. Nowadays Casacore only uses std::map, but for backward compatibility an std::map object is serialized in AipsIO in the old SimpleOrderedMap way.
A std::map¡K,V¿ object is stored in AipsIO as follows.



header

SimpleOrderedMap (version 1)

V

default value; not used

uInt

nentries (number of key-value pairs)

uInt

increment (number of entries to add when the map is extended); not used

KV[nentries]

nentries key-value pairs (first the key, then the value)



2.5 TableRecord object

A TableRecord object holds a set of keyword/value pairs. They are used in CTDS to represent the keyword set attached to the table and each column.
A value can be a scalar or array of one the standard data types including complex and string values. Furthermore a value can be a TableRecord in itself; the nesting can be arbitrarily deep. Finally a value can be a Table object which is used to hold the subtables of a table.

A TableRecord is described by a RecordDesc object that also contains the descriptions of possible nested records. A TableRecord can have a fixed or variable format. In CTDS only variable format TableRecords are used.
In the very early days of CTDS a keyword set was stored in a very different way. Reading back such keyword sets is still supported in the code.

A TableRecord is stored in AipsIO as follows:



Data Type

Description





AipsIO header

TableRecord (version 1)

RecordDesc

The TableRecord description

Int

record type (CTDS always uses variable format records)

any

All values are stored consecutively. Their data types are defined in the RecordDesc.

Scalars and arrays are stored in the standard AipsIO way.

For a nested TableRecord only the values are stored, because its description is part of the record description. However, a nested empty TableRecord is stored as such.

If a value is a Table, only the table name is stored. It is tried to make the name relative to the name of the parent table. In that way moving a table has no effect its subtable names. A relative name is prefixed with ././ if the full parent table name has to be used as directory, while the prefix ./ is used if the directory name of the parent table has to be used. Note that in all practical cases (like a MeasurementSet) ././ is used as prefix.



The RecordDesc is stored as follows:



Data Type

Description





AipsIO header

RecordDesc (version 1 or 2)

uInt

number of keywords

String

keyword name

Int

data type (from DataType.h)

Specific

specific info for a value containing an array, Table, or Record.

- shape for an array (shape [-1] means it is not fixed). A shape is stored as a vector of Int values.

- TableDesc name for a Table (empty name means it is not fixed).

- description for a Record.

String

comment to explain the meaning of a keyword (not in version 1)



The latter 4 fields are repeated for each keyword.

3 PlainTable

A PlainTable object represents a basic CTDS table with its own keywords, columns, and associated data managers. It is a directory containing several files named table.*.
Besides the files table.dat and table.info a PlainTable consists of the following files.

Note it is possible to combine these files into a single so-called MultiFile to reduce the number of actual files.

3.1 General file table.dat

This file is the main table file. It contains the table definition, the keywords and their values, and the binding of columns to data managers. It is basically a nested AipsIO object containing the following fields.



Data Type Description




AipsIO headerTable (version 1 or 2)
uInt number of table rows
uInt endianness (0=little, 1=big)
String table type (PlainTable)
TableDesc table description and table keyword set
TableRecord table keyword set (in version 1 only)
ColumnSet column data manager info and column keyword sets


Note that the set of table and column keywords is part of the table description. Only in the very first CTDS version (before 1995), the table keyword set was stored separately. For backward compatibility it is still possible to read such a keyword set, but it is not stored anymore.

It is instructive to use the UNIX strings command on a table.dat file. It shows all the strings mentioned in this section.

3.1.1 Table Description

The TableDesc object contains the table description defining the logical view of a table. It contains the name, data type, and some extra info of each column. Furthermore it contains the keyword set attached to the table and each column. Finally it contains a private keyword set that is used by the TableDesc object to hold some extra meta data.



Data Type

Description





AipsIO header

TableDesc (version 1 or 2)

String

name of table description

String

version info of table description

String

comment about table description

TableRecord

table keyword set

TableRecord

private table keyword set (not in version 1)

uInt

ncol (number of columns)

ColumnDesc[ncol]

description of each column



A ColumnDesc object defines a column and is stored as follows.



Data Type

Description





uInt

ColumnDesc version (=1)

String

Type of column using the name of the object containing the description. It can have one of the following values:

- ScalarColumnDesc <T

- ArrayColumnDesc <T

- ScalarRecordColumnDesc

- SubTableDesc

where T is a string defining the type. It can be: Bool, uChar, Short, uShort, Int, uInt, Int64, float, double, Complex, DComplex, and String.



uInt

BaseColumnDesc version (=1)

String

column name

String

comment

String

default data manager type

String

default data manager group

Int

data type (enum from DataType.h, e.g., TpInt). It must match the type in the column type string.

Int

column options

1 = direct array; meant for small fixed sized arrays for which a storage manager can decide to put it directly.

2 = undefined value can exist (not used).

4 = array has a fixed shape.

uInt

array dimensionality (-1 means scalar)

IPosition

fixed array shape (only for columns containing arrays)

uInt

maximum length of a string value (0 = no maximum)

TableRecord

column keyword set



For a column containing scalars


uInt

ScalarColumnDesc version (=1)

T

default value (stored according to the data type)



For a column containing arrays


uInt

ArrayColumnDesc version (=1)

Bool

dummy flag (always False); is present for backward compatibility



For a column containing records


uInt

ScalarRecordColumnDesc version (=1)



For a column containing subtables


uInt

SubTableDesc version (=1)

String

TableDesc name

Bool

Flag telling how the description of the subtable is found.

True = table description is by name, thus in a file with that name.

False = table description is in the next field.

TableDesc

the description of the subtable (only if flag=False)



3.1.2 ColumnSet

The ColumnSet class contains the information how the logical table columns are mapped to their physical counterparts served by the data managers. As part of the table.dat file it is written into the TableDesc’s AipsIO as follows,



Data Type

Description





Int

version as a negative number (only for version > 1)

uInt or Int64

number of rows (Int64 for version >= 3)

Int

StorageOption::Option); only for version >= 3

uInt

StorageOption::blockSize; only for version >= 3

uInt

highest data manager sequence number used

uInt

nrdm; number of data managers used



for each data manager (nrdm entries):

string

name of the data manager

uInt

sequence number of the data manager (as used in f<i> in file name)



ColumnInfo[nrcol]

info about each column (see below)

DMInfo[nrdm]

info about each data manager (see below)



ColumnInfo
For each column some information is stored in the ColumnSet’s AipsIO structure.



Data Type

Description





Int

version (1 or 2)

TableRecord

column keywordset; only for version 1

String

original column name (the name used at table creation, thus without a possible rename)

derived column info



uInt

version (1)

uInt

data manager sequence number



for array columns


Bool

shapeColDef; has the column a fixed shape?

IPosition

the shape of each array in the column; only if shapeColDef is True



DMInfo
Each data manager can write some short information into the ColumnSet’s AipsIO structure. This is kept as an array of bytes, stored as a length (uInt) followed by the bytes. The data manager is responsible for the coding and decoding of these bytes.

Most data managers do not write anything, thus for them only the length 0 is stored. Only the StandardStMan and IncrementalStMan write some info which is described in the Header paragraphs of the respective sections.

3.2 Info file table.info

This file is a text file giving some brief info about the table. It is read and written by class TableInfo.
It contains the following lines:

3.3 Lock file table.lock

CTDS supports concurrent access to a table. One writer or multiple readers can be active at the same time. A lock needs to be acquired to access the table, although there is a mode to bypass locking when reading or to bypass locking entirely. See note 256 for more info about how locking is done.

The first few bytes in the lock file are used by the concurrency mechanism of CTDS to manage a read or write lock (on the entire table).

The lock file contains two data structures to assist processes in acquiring locks and synchronizing their internal data structures with table data changes made in another process. All data are written in big-endian format.



Data Type

Description





AipsIO header

sync (version 1 or 2)

uInt or uInt64

number of table rows (uInt in version 1; uInt64 in version 2)

uInt

number of table columns

uInt

modify counter

uInt

table change counter. The counter is incremented if the table.dat file has changed (e.g. by adding a keyword).

Block <uInt >

change counter per data manager. A counter is incremented if that data manager has written data.



A process acquiring a lock detects if the main table file or a data manager file has changed by comparing its change counters with the ones in the lock file.

3.4 Storage Managers

There are several storage managers, each writing the data in its own way. They write their data in files called table.f<i> where <i> is the sequence number of the storage manager. The sequence number can be seen using the program showtableinfo. As described above storage managers can use extra files, in particular an indirect array file.

3.4.1 StandardStMan

The data of the columns bound to this storage manager are stored in equally sized buckets. The buckets are stored sequentially in the file, so the bucket number determines the file offset. Indices tell which rows and columns are contained in the buckets.

StandardStMan uses the main file table.f<i> in the table directory where <i> is the sequence number. For indirect arrays it uses another file called table.f<i>i. The main file consists of four types of data:

However, this initial simple scheme can get more complex when rows and/or columns are added or removed. Such operations can have the effect that buckets do not contain the same number of rows and that possibly new indices are created for groups of new columns. The storage manager will maintain a list of free buckets in case all data from a bucket are removed.

It is possible to remove rows, which is done by removing the row from its bucket and shifting the other rows in that bucket. Other buckets are not touched. Therefore not all buckets need to contain the same number of rows. An index block tells the starting row of each bucket. If all rows of a bucket are removed, the bucket is added to the empty bucket list. An empty bucket can be reused in a new bucket needs to be acquired when adding data to the table.

Initially all columns are combined in a single set and are stored jointly in the buckets. However, if columns are added to a table, they do not fit in the current set of buckets. They will be stored as a new set of columns having its own index block.

Small fixed sized data are stored directly in a data bucket. Such data are numeric scalar values and small arrays such as UVW coordinates. Small strings ( <= 8 bytes) and fixed sized strings are also stored directly. Larger strings are stored in heap buckets which are referred to in the data buckets. Strings can be span multiple heap buckets, thus it is possible to store extremely large strings.

Header
The header block is contained in the first 512 bytes of the table.f<i> file. It contains the following AipsIO structure.



Data Type

Description





AipsIO header

StandardStMan (version 1, 2, 3 or 4)

Bool

data stored in big endian format? (not present for version <= 2)

uInt

bucket size in bytes

uInt

number of buckets

uInt

Persistent cache size (in buckets)

uInt

number of free buckets

Int

first free bucket (-1 is no free bucket)

uInt

number of index buckets

Int

first index bucket

uInt

offset of index in bucket (not present for version 1); if >0, index fits in a single bucket

Int

last string heap bucket (-1 is no string heap)

uInt

index length (in bytes)

uInt

number of indices



DMInfo
StandardStMan stores the following data manager info in the DMInfo part of the ColumnSet AipsIO structure.



uInt[ncolumn]

the number of the column set each column belongs to

uInt[ncolumn]

the offset of each column in the data buckets



The number of columns is defined in the table description.

Data Buckets
The columns handled by this storage manager are grouped into sets. At table creation time there is a single set, but sets might be added as columns are added to this storage manager at a later stage. Similarly, sets might be removed as columns are removed. Each set of columns has its own set of data buckets and its own index to describe the contents of the buckets.

For a particular set of columns a bucket can store N rows where N depends on the size of the bucket and the sizes of the columns’ data types. The data are stored in a columnar way in each bucket, thus each bucket contains N rows of column 1, thereafter N rows of column 2, etc. without any padding. Boolean values are storeds as bits. Note that the column data types are defined in the Table Description and not in the storage manager. The offset of a column is fixed and the same in the data buckets is fixed. It will never change, even when rows or columns are removed. In principle the first bucket contains rows 0..N, the next bucket contains rows N..2N, etc. However, this can change as rows are deleted. Row deletion only affects the bucket the row is in.
The index defines which rows are contained in the buckets by keeping the last row number for each bucket. The index is described in a next paragraph.

Data with a fixed length are stored directly in the data buckets, other data (variable length strings and arrays) are stored elsewhere and referred to from the data buckets. StandardStMan can store all scalar and array following data types supported by CTDS (as little or big endian).




Data Type Size Directly






Scalar



Bool 1 byte yes
uChar 1 byte yes
Short 1 byte yes
uShort 2 byte yes
Int 4 byte yes
uInt 4 byte yes
Int64 8 byte yes
Float 4 byte yes
Double 8 bytes yes
Complex 8 bytes yes
DComplex 16 bytes yes
String with max length maxlen bytes yes; has trailing zero if string size < maxlen
Variable string <= 8 bytes12 bytes yes; 8 string bytes followed by Int giving length
Variable string > 8 bytes 12 bytes reference to heap bucket as Int[3] (bucketnr, offset, length)



Direct Arrays



Bool .. DComplex nelem * data type sizeyes; (the shape is known in the Table Description); Bool arrays are stored as bits
String 12 bytes reference to heap bucket as Int[3] (bucketnr, offset, length)



Indirect Arrays



all types 8 bytes offset in the indirect file



String Heap Buckets
These buckets contain arrays of strings and variable length scalar strings. Each bucket start with a small header of 4 integers whereafter the string data are stored.



Data Type

Description





Int

reserved for the free bucket list

Int

used length; the number of bytes used (including gaps), thus the next free byte in the bucket.

Int

ndeleted; the total length of the gaps arising from deletion or updating a value with a shorter string.

Int

next bucket; the bucket containing the possible continuation of the last string (array) in this bucket. -1 means no continuation. Note that the continuation can be continued again.



Strings are stored consecutevily in the buckets. A long string and an array of strings are continued in another bucket if they do not fit entirely in the current bucket. As described above the continuation is continued as well if it is very, very long.

It is possible in CTDS to change an existing value, thus a string (array) can be replaced by another one. If possible the new value is written at the same location, where a gap arises if the new value is shorter. If it is longer, it has to be stored at a new location and the old location will be a gap. Similarly, a gap arises if a string is deleted because a row or column is deleted. Only the total size of gaps is administered, not their locations. Thus usually gap space will be lost. Only if it is at the end of the used part of a bucket it can be deducted from the used length and be reused. Note that if all string space in a bucket is deleted, the bucket will be added to the free bucket list.

A string (array) is stored in the heap buckets depending on its type.

The data bucket refers to the string (array) by means of a bucket number, offset and length. In case of continuation that bucket number is the first bucket. The length is the total length, thus including shape and flag if applicable.

Index Buckets
The overall index contains one or more indices, one for each set of columns. An index defines the rows contained in the buckets used by the set of columns. This is done by keeping the last row number per bucket. The index is needed to cope with the possibly variable number of rows in the buckets due to removed rows.

An index also contains information about free space in the data buckets due to removed columns because that space can possibly be reused when a column is added later. The map consists of a std::map <Int,Int > object telling the offset and length of each free space part in a data bucket. Note it is the same for each data bucket described by the index.

The indices to write are serialized and stored in one or more buckets. The first 8 bytes of the index buckets contain the bucket number of the next part of the serialized indices as 2 big-endian signed integer values. This could be done in 4 bytes, but for redundancy purposes it is done twice. A value -1 indicates no next index bucket.

When rewriting the index, care is taken that it is done in different buckets to assure that the index is always present in case the system crashes in the middle of writing the index. Once the index is written successfully, the header is updated and the old index buckets are removed and added to the free bucket list.
If the serialized index fits in half a bucket (which is often the case), that bucket is reused in subsequent index rewrites by alternating between both parts of the bucket. The index offset field in the header (see above) tells the offset of the index in the bucket.

The indices are serialized in the following AipsIO structure. Note that the number of indices is part of the header.



Data Type

Description





AipsIO header

SSMIndex (version 1 or 2)

uInt

Number of index entries

uInt

Number of rows fitting in a bucket

Int

Number of columns served by this index

std::map <Int,Int >

Free space map

Block <uInt > or Block <Int64 >

Last row in each bucket; Int64 for version >1



Free Buckets
A bucket is added to of the free bucket list once it does not contain any data anymore. The first free bucket is given in the header, the others form a list by maintaining the next bucket number in the first 4 bytes of the bucket in big endian format.

Actually, a free bucket is added to the head of the list to avoid having to update the last free bucket. Thus the current first free bucket is put at the start of the new free bucket, which in its turn becomes the first free bucket to be stored in the header.

3.4.2 IncrementalStMan

The IncrementalStMan is meant for data that changes seldomly, so they can be stored in a compressed way by storing the number of rows the data value is the same. It can be done for scalars as well as arrays.

The storage manager cannot add nor remove columns, but it can add and remove rows. The data structures in the files are designed for these properties.

Header
The header block is contained in the first 512 bytes of the table.f<i> file. It contains the following AipsIO structure.



Data Type

Description





AipsIO header

IncrementalStMan (version 1, 2, 3, 4 or 5)

Bool

data stored in big endian format? (not present for version <= 2=4)

uInt

bucket size in bytes

uInt

number of buckets

uInt

Persistent cache size (in buckets)

uInt

unique column number

columns added
uInt

number of free buckets

Int

first free bucket (-1 is no free bucket)



DMInfo
IncrementalStMan stores the following data manager info in the DMInfo part of the ColumnSet AipsIO structure.



uInt[ncolumn]

the number of the column set each column belongs to

uInt[ncolumn]

the offset of each column in the data buckets



The number of columns is defined in the table description.

Data Bucket
A data bucket contains the data of all columns for a given number of rows. It is split into a data part (at the beginning of each bucket) and an index part (at the end of each bucket). The first 4 bytes of each bucket give the offset of the index part (thus also imply the length of the data part).
The endianness of the data and index part is the same as the endianness of the table.



Data Type

Description





uInt

offset of the index in this bucket. The high byte of the offset defines if row numbers are stored as 32 or 64 bits (0 = 32 bits, 1 = 64 bits). Note that this does not need to be the first byte, because that depends on the endianness.

byte[offset-4]

the data part

byte[N]

the index part as described below



Data rows can be added to a bucket until it is full, after which a new bucket is created. When a value in a bucket gets updated, it may not fit in the bucket anymore. In that case the bucket is split and a new bucket is created. When a row is removed, a bucket may get empty and added to the free bucket list.
Note that this storage manager cannot remove columns.

Data Part
The data part contains the values of all columns bound to this storage manager. Scalars and fixed shaped arrays of all data types (including strings) can be stored. However, a string cannot span buckets. Variable sized arrays are stored in the indirect array file; their offsets in that file are stored in the bucket’s data part.

All values in the data part are stored consecutively. Bool arrays are stored as bits. The index part points to the correct data value. Note that the shape of a fixed shaped array is not stored since it is part of the table description. The shape of a variable sized array is stored with its data in the indirect array file.

Index Part
The bucket index defines the starting and end row of the data in a bucket. The index part in each bucket defines per column which rows have which values. Subsequent rows can have the same value, thus the index part defines the first and last row number in the bucket having that value. In fact, it only defines the first row number, because the first row number of the next value defines the last row number of this value. Note that these row numbers start at 0. The bucket index defines the actual row number; it also defines the last row number of the last value.

The index part contains per column the following information. It is stored consecutively for all columns.



Data Type

Description





uInt

number of values

uInt or Int64[nr]

first row number in bucket having the corresponding value. It starts at 0.

uInt[nr]

offset in data part for value given by the row number



Bucket Index
The bucket index defines which rows are contained in each bucket. It is written at the end of the file, thus right after the last bucket, as an AipsIO structure with the following fields.



Data Type

Description





AipsIO header

ISMIndex (version 1 or 2)

uInt

number of buckets

uInt or Int64[nr+1]

first row number in each bucket



Free Buckets
A bucket is added to of the free bucket list once it does not contain any data anymore. The first free bucket is given in the header, the others form a list by maintaining the next bucket number in the first 4 bytes of the bucket in big endian format.

Actually, a free bucket is added to the head of the list to avoid having to update the last free bucket. Thus the current first free bucket is put (as the next free bucket) at the start of the new free bucket, which in its turn becomes the first free bucket to be stored in the header.

3.4.3 TiledStMan

The Tiled Storage Manager stores the data in a tiled way to achieve that access along the different axes is about equally fast. It is similar to the chunked storage of data arrays in HDF5. This storage manager can only be used for columns containing array with a fixed sized data type, thus scalar columns and string array columns cannot be stored. There are a few flavours.

- index - cell, column, shape stman

3.4.4 AipsIOStMan

3.4.5 Indirect Array File

Variable shaped arrays cannot be stored directly in most storage managers. Instead they are stored in a separate file. The offset in that file is stored by the storage managers.

3.5 Virtual Column Engines

3.5.1 ForwardColumnEngine

3.5.2 ScaledComplex

3.5.3 VirtualTaQLColumn

4 RefTable

5 ConcatTable

6 MultiFile Format

The MultiFile format has been designed to reduce the number of files. CTDS uses one or more files per storage manager which can result in quite a large number of files for a table. In particular a MeasurementSet and its subtables can consist of dozens of files. Often these files are quite small mapping poorly to more modern file systems such as Lustre, which use large IO blocks.
The MultiFile format has been designed to combine all these files into a single container file. Furthermore, MultiFile has the option to store a CRC value for each block to ensure data are read back correctly.

The MultiFile format comes in 2 flavours:
1. The MultiFile which is a regular file with an header to denote all individual files in it.
2. The MultiHDF5 which is an HDF5 file containing a dataset per individual file.
They are described in more detail below.

6.1 MultiFile

A MultiFile is a binary file divided into (large) blocks of equal size. Each CTDS file is stored as a virtual file in one or more blocks in the MultiFIle. A header describes the MultiFile layout and the virtual files. It contains an index telling the name of the CTDS files and the block numbers containing their data. The header also contains a list of free blocks which arise when a CTDS file is deleted or truncated.

The MultiFile concept has evolved over time. The second version is more powerful and more robust than the first version. In both versions the header is maintained in memory and occasionally flushed to disk. It makes use of the AipsIO mechanism to serialize the header and to store it in the first block of the MultiFile. Continuation blocks are used if it is too large for a single block.

6.1.1 MultiFile version 1

Version 1 of the MultiFile can contain the files of a single CTDS table. It does not support nested MultiFiles as version 2 does. Neither does it support CRC values. Header continuation blocks are not stored in the MultiFile itself, but in a small separate file with the extra extension _hdrext in its file name.
The header is stored as follows:



Data Type

Description





Int64

header size in bytes

Int64

block size in bytes

Int64

counter keeping track how often the header was written. It is used to know if changes have been made by another process requiring the header to be reread.

AipsIO

AipsIO structure containing the entire MultiFile index



The MultiFile index is serialized in the following AipsIO structure.



Data Type

Description





AipsIO header

MultiFile (version 1)

Int64

Total number of blocks used in the MultiFile (including free blocks)

C-array of FileInfo

Vector of objects containing the info of each CTDS file in the MultiFile

C-array of Int64

Vector of free blocks



Each FileInfo entry contains:



Data Type

Description





String

CTDS file name

C-array of Int64

Vector of MultiFile block numbers used for this CTDS file

Int64

Total size in bytes of the CTDS file



6.1.2 MultiFile version 2

Version 2 of the MultiFile can contain multiple CTDS files (as version 1 does), but it also supports nested MultiFiles. Nested MultiFiles are used to store subtables in the MultiFile of the parent table to achieve that an entire MeasurementSet is stored in a single file. A nested MultiFile is the same as any other MultiFile, thus having its own header, index, block size, etc. It is stored as a single file in the parent MultiFile. However, a nested MultiFile does not have CRC values.
The header is stored as follows:




OffsetData Type

Description







0Int64

0 (to make it different from version 1)

8Int64

blocknr of first continuation header block (0 = no continuation)

16Int64

counter keeping track how often the header was written. It is used to know if changes have been made by another process requiring the header to be reread.

24Int32

version (=2)

28uInt32

CRC of entire header (using 0 for this CRC value)

32Int64

header size in bytes

40Int64

block size in bytes

48Int64

Total number of blocks used in the MultiFile

56 char

use CRC?

57char[7]

spare

64AipsIO

AipsIO structure containing the MultiFile index

nfile*FileInfo

general info per file

nfile*Index

packed index per file

Index

packed index of free blocks

Int64

nCRC (0 if CRCs are not used)

nCRC*uInt32

CRC value per block

Int64

number of available blocks of first header continuation

n*Int64

block numbers available for first header continuation

Int64

number of available blocks of second header continuation

n*Int64

block numbers available for second header continuation

2*Int64

number of block actually used for first and second continuation




There are two header continuation blocks to improve robustness. They are used alternately when the header is flushed to disk. The first block (at offset 0 of the file) it written last. This ensures there is always a valid header in case of a crash. Note that new continuation blocks are added as needed. When added, the header is serialized again to contain those new block numbers. In case fewer continuation blocks are needed than there are available, the superfluous blocks are not added to the free list but are kept available.
The MultiFile index is serialized in the following AipsIO structure.



Data Type

Description





AipsIO header

MultiFile (version 1)

C-array of FileInfo

Vector of objects containing the info of each CTDS file in the MultiFile



Each FileInfo entry looks as:



Data Type

Description





String

CTDS file name

Int64

Total size in bytes of the CTDS file

Bool

Is this file a nested MultiFile?



A packed index is a compressed form of the index telling which blocks are used by a file. It is compressed by storing a repeat count for consecutive block numbers. The repeat count is a negative value to make it different from the block numbers. For example:

    10,-3,17   means block   10,11,12,17

6.2 MultiHDF5