casacore
|
A bucket in the Incremental Storage Manager. More...
#include <ISMBucket.h>
Public Member Functions | |
ISMBucket (ISMBase *parent, const char *bucketStorage) | |
Create a bucket with the given parent. More... | |
~ISMBucket () | |
uInt | getInterval (uInt colnr, rownr_t rownr, rownr_t bucketNrrow, rownr_t &start, rownr_t &end, uInt &offset) const |
Get the row-interval for given column and row. More... | |
Bool | canAddData (uInt leng) const |
Is the bucket large enough to add a value? More... | |
void | addData (uInt colnr, rownr_t rownr, uInt index, const char *data, uInt leng) |
Add the data to the data part. More... | |
Bool | canReplaceData (uInt newLeng, uInt oldLeng) const |
Is the bucket large enough to replace a value? More... | |
void | replaceData (uInt &offset, const char *data, uInt newLeng, uInt fixedLength) |
Replace a data item. More... | |
const char * | get (uInt offset) const |
Get a pointer to the data for the given offset. More... | |
uInt | getLength (uInt fixedLength, const char *data) const |
Get the length of the data value. More... | |
uInt & | getOffset (uInt colnr, rownr_t rownr) |
Get access to the offset of the data for given column and row. More... | |
Block< rownr_t > & | rowIndex (uInt colnr) |
Get access to the index information for the given column. More... | |
Block< uInt > & | offIndex (uInt colnr) |
Return the offsets of the values stored in the data part. More... | |
uInt & | indexUsed (uInt colnr) |
Return the number of values stored. More... | |
rownr_t | split (ISMBucket *&left, ISMBucket *&right, Block< Bool > &duplicated, rownr_t bucketStartRow, rownr_t bucketNrrow, uInt colnr, rownr_t rownr, uInt lengToAdd) |
Split the bucket in the middle. More... | |
Bool | simpleSplit (ISMBucket *left, ISMBucket *right, Block< Bool > &duplicated, rownr_t &splitRownr, rownr_t rownr) |
Determine whether a simple split is possible. More... | |
uInt | getSplit (uInt totLeng, const Block< uInt > &rowLeng, const Block< uInt > &cumLeng) |
Return the index where the bucket should be split to get two parts with almost identical length. More... | |
void | shiftLeft (uInt index, uInt nr, Block< rownr_t > &rowIndex, Block< uInt > &offIndex, uInt &nused, uInt leng) |
Remove nr items from data and index part by shifting to the left. More... | |
void | copy (const ISMBucket &that) |
Copy the contents of that bucket to this bucket. More... | |
void | show (ostream &os) const |
Show the layout of the bucket. More... | |
Bool | check (uInt &offendingCol, uInt &offendingIndex, rownr_t &offendingRow, rownr_t &offendingPrevRow) const |
Check that there are no repeated rowIds in the bucket. More... | |
Static Public Member Functions | |
static char * | readCallBack (void *owner, const char *bucketStorage) |
Callback function when BucketCache reads a bucket. More... | |
static void | writeCallBack (void *owner, char *bucketStorage, const char *bucket) |
Callback function when BucketCache writes a bucket. More... | |
static char * | initCallBack (void *owner) |
Callback function when BucketCache adds a new bucket to the data file. More... | |
static void | deleteCallBack (void *, char *bucket) |
Callback function when BucketCache removes a bucket from the cache. More... | |
Private Member Functions | |
ISMBucket (const ISMBucket &) | |
Forbid copy constructor. More... | |
ISMBucket & | operator= (const ISMBucket &) |
Forbid assignment. More... | |
void | removeData (uInt offset, uInt leng) |
Remove a data item with the given length. More... | |
uInt | insertData (const char *data, uInt leng) |
Insert a data value by appending it to the end. More... | |
uInt | copyData (ISMBucket &other, uInt colnr, rownr_t toRownr, uInt fromIndex, uInt toIndex) const |
Copy a data item from this bucket to the other bucket. More... | |
void | read (const char *bucketStorage) |
Read the data from the storage into this bucket. More... | |
void | write (char *bucketStorage) const |
Write the bucket into the storage. More... | |
Private Attributes | |
ISMBase * | stmanPtr_p |
Pointer to the parent storage manager. More... | |
uInt | uIntSize_p |
The size (in bytes) of an uInt and rownr_t (used in index, etc.). More... | |
uInt | rownrSize_p |
uInt | dataLeng_p |
The size (in bytes) of the data. More... | |
uInt | indexLeng_p |
The size (in bytes) of the index. More... | |
PtrBlock< Block< rownr_t > * > | rowIndex_p |
The row index per column; each index contains the row number of each value stored in the bucket (for that column). More... | |
PtrBlock< Block< uInt > * > | offIndex_p |
The offset index per column; each index contains the offset (in bytes) of each value stored in the bucket (for that column). More... | |
Block< uInt > | indexUsed_p |
Nr of used elements in each index; i.e. More... | |
char * | data_p |
The data space (in external (e.g. More... | |
A bucket in the Incremental Storage Manager.
Internal
ISMBucket represents a bucket in the Incremental Storage Manager.
The Incremental Storage Manager uses a BucketCache object to read/write/cache the buckets containing the data. An ISMBucket
object is the internal representation of the contents of a bucket. ISMBucket
contains static callback functions which are called by BucketCache
when reading/writing a bucket. These callback functions do the mapping of bucket data to ISMBucket
object and vice-versa.
A bucket contains the values of several rows of all columns bound to this Incremental Storage Manager. A bucket is split into a data part and an index part. Each part has an arbitrary length but together they do not exceed the fixed bucket length.
The beginning of the data part contains the values of all columns bound. The remainder of the data part contains the values of the rows/columns with a changed value.
The index part contains an index per column. Each index contains the row number and an offset for a row with a stored value. The row numbers are relative to the beginning of the bucket, so the bucket has no knowledge about the absolute row numbers. In this way deletion of rows is much simpler.
The contents of a bucket looks like:
The data part contains all data value belonging to the bucket. The index part contains for each column the following data:
Note that the row numbers in the bucket start at 0, thus are relative to the beginning of the bucket. The main index kept in ISMIndex knows the starting row of each bucket. In this way bucket splitting and especially row removal is much easier.
The bucket can be stored in canonical or local (i.e. native) data format. When a bucket is read into memory, its data are read, converted, and stored in the ISMBucket object. When flushed, the contents are written. ISMBucket takes care that the values stored in its object do not exceed the size of the bucket. When full, the user can call a function to split it into a left and right bucket. When the new value has to be written at the end, the split merely consist of creating a new bucket. In any case, care is taken that a row is not split. Thus a row is always entirely contained in one bucket.
Class ISMColumn does the actual writing of data in a bucket and uses the relevant ISMBucket functions.
ISMBucket encapsulates the data of a bucket.
Definition at line 132 of file ISMBucket.h.
casacore::ISMBucket::ISMBucket | ( | ISMBase * | parent, |
const char * | bucketStorage | ||
) |
Create a bucket with the given parent.
When bucketStorage
is non-zero, reconstruct the object from it. It keeps the pointer to its parent (but does not own it).
casacore::ISMBucket::~ISMBucket | ( | ) |
|
private |
Forbid copy constructor.
void casacore::ISMBucket::addData | ( | uInt | colnr, |
rownr_t | rownr, | ||
uInt | index, | ||
const char * | data, | ||
uInt | leng | ||
) |
Add the data to the data part.
It updates the bucket index at the given index. An exception is thrown if the bucket is too small.
Is the bucket large enough to replace a value?
Bool casacore::ISMBucket::check | ( | uInt & | offendingCol, |
uInt & | offendingIndex, | ||
rownr_t & | offendingRow, | ||
rownr_t & | offendingPrevRow | ||
) | const |
Check that there are no repeated rowIds in the bucket.
void casacore::ISMBucket::copy | ( | const ISMBucket & | that | ) |
Copy the contents of that bucket to this bucket.
This is used after a split operation.
|
private |
Copy a data item from this bucket to the other bucket.
|
static |
Callback function when BucketCache removes a bucket from the cache.
This function dletes the ISMBucket bucket object.
|
inline |
Get a pointer to the data for the given offset.
Definition at line 315 of file ISMBucket.h.
References data_p.
uInt casacore::ISMBucket::getInterval | ( | uInt | colnr, |
rownr_t | rownr, | ||
rownr_t | bucketNrrow, | ||
rownr_t & | start, | ||
rownr_t & | end, | ||
uInt & | offset | ||
) | const |
Get the row-interval for given column and row.
It sets the start and end of the interval to which the row belongs and the offset of its current value. It returns the index where the row number can be put in the bucket index.
Get the length of the data value.
It is fixedLength
when non-zero, otherwise read it from the data value.
Get access to the offset of the data for given column and row.
It allows to change it (used for example by replaceData).
uInt casacore::ISMBucket::getSplit | ( | uInt | totLeng, |
const Block< uInt > & | rowLeng, | ||
const Block< uInt > & | cumLeng | ||
) |
Return the index where the bucket should be split to get two parts with almost identical length.
Return the number of values stored.
Definition at line 327 of file ISMBucket.h.
References indexUsed_p.
|
static |
Callback function when BucketCache adds a new bucket to the data file.
This function creates an empty ISMBucket object. It returns the pointer to ISMBucket object which gets part of the cache. The object gets deleted by the deleteCallBack function.
Insert a data value by appending it to the end.
It returns the offset of the data value.
Return the offsets of the values stored in the data part.
Definition at line 323 of file ISMBucket.h.
References offIndex_p.
|
private |
Read the data from the storage into this bucket.
|
static |
Callback function when BucketCache reads a bucket.
It creates an ISMBucket object and converts the raw bucketStorage to that object. It returns the pointer to ISMBucket object which gets part of the cache. The object gets deleted by the deleteCallBack function.
Remove a data item with the given length.
If the length is zero, its variable length is read first.
void casacore::ISMBucket::replaceData | ( | uInt & | offset, |
const char * | data, | ||
uInt | newLeng, | ||
uInt | fixedLength | ||
) |
Replace a data item.
When its length is variable (indicated by fixedLength=0), the old value will be removed and the new one appended at the end. An exception is thrown if the bucket is too small.
Get access to the index information for the given column.
This is used by ISMColumn when putting the data.
Return the row numbers with a stored value.
Definition at line 319 of file ISMBucket.h.
References rowIndex_p.
void casacore::ISMBucket::shiftLeft | ( | uInt | index, |
uInt | nr, | ||
Block< rownr_t > & | rowIndex, | ||
Block< uInt > & | offIndex, | ||
uInt & | nused, | ||
uInt | leng | ||
) |
Remove nr
items from data and index part by shifting to the left.
The rowIndex
, offIndex
, and nused
get updated. The caller is responsible for removing data when needed (e.g. ISMIndColumn
removes the indirect arrays from its file).
void casacore::ISMBucket::show | ( | ostream & | os | ) | const |
Show the layout of the bucket.
Bool casacore::ISMBucket::simpleSplit | ( | ISMBucket * | left, |
ISMBucket * | right, | ||
Block< Bool > & | duplicated, | ||
rownr_t & | splitRownr, | ||
rownr_t | rownr | ||
) |
Determine whether a simple split is possible.
If so, do it. This is possible if the new row is at the end of the last bucket, which will often be the case.
A simple split means adding a new bucket for the new row. If the old bucket already contains values for that row, those values are moved to the new bucket.
This fuction is only called by split, which created the left and right bucket.
rownr_t casacore::ISMBucket::split | ( | ISMBucket *& | left, |
ISMBucket *& | right, | ||
Block< Bool > & | duplicated, | ||
rownr_t | bucketStartRow, | ||
rownr_t | bucketNrrow, | ||
uInt | colnr, | ||
rownr_t | rownr, | ||
uInt | lengToAdd | ||
) |
Split the bucket in the middle.
It returns the row number where the bucket was split and the new left and right bucket. The caller is responsible for deleting the newly created buckets. When possible a simple split is done.
The starting values in the right bucket may be copies of the values in the left bucket. The duplicated Block contains a switch per column indicating if the value is copied.
|
private |
Write the bucket into the storage.
|
static |
Callback function when BucketCache writes a bucket.
It converts the ISMBucket bucket object to the raw bucketStorage.
|
private |
The data space (in external (e.g.
canonical) format).
Definition at line 311 of file ISMBucket.h.
Referenced by get().
|
private |
The size (in bytes) of the data.
Definition at line 298 of file ISMBucket.h.
|
private |
The size (in bytes) of the index.
Definition at line 300 of file ISMBucket.h.
Nr of used elements in each index; i.e.
the number of stored values per column.
Definition at line 309 of file ISMBucket.h.
Referenced by indexUsed().
The offset index per column; each index contains the offset (in bytes) of each value stored in the bucket (for that column).
Definition at line 306 of file ISMBucket.h.
Referenced by offIndex().
The row index per column; each index contains the row number of each value stored in the bucket (for that column).
Definition at line 303 of file ISMBucket.h.
Referenced by rowIndex().
|
private |
Definition at line 296 of file ISMBucket.h.
|
private |
Pointer to the parent storage manager.
Definition at line 293 of file ISMBucket.h.
|
private |
The size (in bytes) of an uInt and rownr_t (used in index, etc.).
Definition at line 295 of file ISMBucket.h.