casacore
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Public Types | Public Member Functions | Private Member Functions | Private Attributes | List of all members
casacore::Sort Class Reference

Sort on one or more keys, ascending and/or descending. More...

#include <Sort.h>

Public Types

enum  Option {
  DefaultSort,
  HeapSort,
  InsSort,
  QuickSort,
  ParSort,
  NoDuplicates
}
 Enumerate the sort options: More...
 
enum  Order {
  Ascending,
  Descending
}
 Enumerate the sort order: More...
 

Public Member Functions

 Sort ()
 The default constructor can be used when the data is only passed in via function sortKey. More...
 
 Sort (const void *data, uInt elementSize)
 Construct a Sort object for the given data array with elements of elementSize bytes. More...
 
 Sort (const Sort &)
 Copy constructor (copy semantics). More...
 
 ~Sort ()
 
Sortoperator= (const Sort &)
 Assignment (copy semantics). More...
 
void sortKey (const void *data, DataType, uInt increment=0, Order=Ascending)
 Define a sort key (the most significant key should be defined first). More...
 
void sortKey (const void *data, const CountedPtr< BaseCompare > &, uInt increment, Order=Ascending)
 
void sortKey (uInt offset, DataType, Order=Ascending)
 
void sortKey (uInt offset, const CountedPtr< BaseCompare > &, Order=Ascending)
 
uInt sort (Vector< uInt > &indexVector, uInt nrrec, int options=DefaultSort, Bool tryGenSort=True) const
 Sort the data array of nrrec records. More...
 
uInt64 sort (Vector< uInt64 > &indexVector, uInt64 nrrec, int options=DefaultSort, Bool tryGenSort=True) const
 
uInt unique (Vector< uInt > &uniqueVector, uInt nrrec) const
 Get all unique records in a sorted array. More...
 
uInt unique (Vector< uInt > &uniqueVector, const Vector< uInt > &indexVector) const
 
uInt unique (Vector< uInt > &uniqueVector, Vector< size_t > &changeKey, const Vector< uInt > &indexVector) const
 
uInt64 unique (Vector< uInt64 > &uniqueVector, uInt64 nrrec) const
 
uInt64 unique (Vector< uInt64 > &uniqueVector, const Vector< uInt64 > &indexVector) const
 
uInt64 unique (Vector< uInt64 > &uniqueVector, Vector< size_t > &changeKey, const Vector< uInt64 > &indexVector) const
 

Private Member Functions

template<typename T >
doSort (Vector< T > &indexVector, T nrrec, int options=DefaultSort, Bool tryGenSort=True) const
 
template<typename T >
doUnique (Vector< T > &uniqueVector, T nrrec) const
 
template<typename T >
doUnique (Vector< T > &uniqueVector, const Vector< T > &indexVector) const
 
template<typename T >
doUnique (Vector< T > &uniqueVector, Vector< size_t > &changeKey, const Vector< T > &indexVector) const
 
void copy (const Sort &that)
 Copy that Sort object to this. More...
 
void addKey (const void *data, DataType, uInt increment, int options)
 Add a sort key giving a data type and stride or the sort key. More...
 
void addKey (SortKey *)
 
template<typename T >
insSort (T nr, T *indices) const
 Do an insertion sort, optionally skipping duplicates. More...
 
template<typename T >
insSortNoDup (T nr, T *indices) const
 
template<typename T >
parSort (int nthr, T nrrec, T *inx) const
 Do a merge sort, if possible in parallel using OpenMP. More...
 
template<typename T >
void merge (T *inx, T *tmp, T size, T *index, T nparts) const
 
template<typename T >
quickSort (T nr, T *indices) const
 Do a quicksort, optionally skipping duplicates (qkSort is the actual quicksort function). More...
 
template<typename T >
quickSortNoDup (T nr, T *indices) const
 
template<typename T >
void qkSort (T nr, T *indices) const
 
template<typename T >
heapSort (T nr, T *indices) const
 Do a heapsort, optionally skipping duplicates. More...
 
template<typename T >
heapSortNoDup (T nr, T *indices) const
 
template<typename T >
void siftDown (T low, T up, T *indices) const
 Siftdown algorithm for heapsort. More...
 
template<typename T >
int compare (T index1, T index2) const
 Compare 2 records based on the comparison functions. More...
 
template<typename T >
int compareChangeIdx (T i1, T i2, size_t &idxComp) const
 As compare() but it also gives back the index of the first comparison function that didn't match. More...
 
template<typename T >
void swap (T index1, T index2, T *indices) const
 Swap 2 indices. More...
 

Private Attributes

PtrBlock< SortKey * > keys_p
 
size_t nrkey_p
 
const void * data_p
 
uInt size_p
 
int order_p
 

Detailed Description

Sort on one or more keys, ascending and/or descending.

Intended use:

Public interface

Review Status

Reviewed By:
Friso Olnon
Date Reviewed:
1995/03/01
Test programs:
tSort, tSort_1

Synopsis

Sort lets you sort data on one or more keys in a mix of Sort::ascending and Sort::descending order. Duplicates can be skipped by giving the option Sort::NoDuplicates. Only in this case the number of output elements can be different from the number of input elements.
The unique function offers another way of getting unique values.

Class Sort does not sort the data themselves, but returns an index to them. This gives more flexibility and allows the sort to be stable; but it is slower.
Very fast sorting of the data themselves can be done with the functions in class GenSort. If sorting on a single key with a standard data type is done, Sort will use GenSortIndirect to speed up the sort.
Four sort algorithms are provided:

Sort::ParSort
The parallel merge sort is the fastest if it can use multiple threads. For a single thread it has O(n*log(n)) behaviour, but is slower than quicksort. A drawback is that it needs an extra index array to do the merge.
Sort::InsSort
Insertion sort has O(n*n) behaviour, thus is very slow for large arrays. However, it is the fastest method for small arrays (< 50 elements) and for arrays already (almost) in the right order.
Sort::QuickSort
Care has been taken to solve the well-known quicksort problems like "array already in order" or "many equal elements". The behaviour is O(n*log(n)) in all the cases tested, even in degenerated cases where the SUN Solaris qsort algorithm is O(n*n).
Sort::HeapSort
Heapsort has O(n*log(n)) behaviour. Its speed is lower than that of QuickSort, so QuickSort is the default algorithm.

The default is to use QuickSort for small arrays or if only a single thread can be used. Otherwise ParSort is the default.

All sort algorithms are stable, which means that the original order is kept when keys are equal.

The sort is a four step process:

  1. Construct the Sort object.
  2. Define the sort keys. The function sortKey must be called for each sort key (the most significant one first). The comparison object can be passed in directly, or a basic data type can be given. In the latter case the appropriate ObjCompare comparison object will be created.
  3. Sort the data. The function sort returns an index array, which is allocated when needed.
  4. Destruct the Sort object (usually done automatically) and delete the index array.

The data can be in a single array of structs, in separate arrays, or in a mix of those. Of course, all arrays must have the same length. The data can be passed to the Sort constructor and/or to the sortKey function. If passed to the Sort constructor, the offset of the data item in the data array must be given to sortKey.

Example

In the first example we sort the data contained in two "parallel" arrays, idata and ddata, both of length nrdata.

sort.sortKey (idata, TpInt); // define 1st sort key
sort.sortKey (ddata, TpDouble,0,Sort::Descending); // define 2nd sort key
Vector<uInt> inx;
sort.sort (inx, nrdata);
for (uInt i=0; i<nrdata; i++) { // show sorted data
cout << idata[inx[i]] << " " << ddata[inx[i]] << endl;
}

Now nr contains the nr of records (=nrdata) and inx an array of (sorted) indices.

In the second example we sort the data stored in an array of structs on the double (ascending) and the string (descending). We can pass the data to the Sort constructor, and the offsets of the struct elements to the sortKey function.

struct Ts {
String as;
double ad;
}
Vector<uInt> inx;
Sort sort (tsarr, sizeof(Ts));
sort.sortKey ((char*)&tsarr[0].ad - (char*)tsarr, TpDouble);
sort.sortKey ((char*)&tsarr[0].as - (char*)tsarr, TpString,
sort.sort (inx, nrts);

Note that the first argument in function sortKey gives the offset of the variable in the struct.

Alternatively, and probably slightly easier, we could pass the data to the sortKey function and use an increment:

struct Ts {
String as;
double ad;
}
Vector<uInt> inx;
sort.sortKey (&tsarr[0].ad, TpDouble, sizeof(Ts));
sort.sortKey (&tsarr[0].as, TpString, sizeof(Ts), Sort::Descending);
sort.sort (inx, nrts);

Finally, we could provide a comparison object for the struct.

struct Ts {
String as;
double ad;
}
class MyCompare: public BaseCompare {
virtual int comp (const void* val1, const void* val2) const
{
const Ts& t1 = *(Ts*)val1;
const Ts& t2 = *(Ts*)val2;
if (t1.ad < t2.ad) return -1;
if (t1.ad > t2.ad) return 1;
if (t1.as < t2.as) return 1; // string must be descending
if (t1.as > t2.as) return -1;
return 0;
}
};
Vector<uInt> inx;
sort.sortKey (tsarr, compareTs, sizeof(Ts));
sort.sort (inx, nrts);

Definition at line 247 of file Sort.h.

Member Enumeration Documentation

Enumerate the sort options:

Enumerator
DefaultSort 
HeapSort 
InsSort 
QuickSort 
ParSort 
NoDuplicates 

Definition at line 251 of file Sort.h.

Enumerate the sort order:

Enumerator
Ascending 
Descending 

Definition at line 259 of file Sort.h.

Constructor & Destructor Documentation

casacore::Sort::Sort ( )

The default constructor can be used when the data is only passed in via function sortKey.

casacore::Sort::Sort ( const void *  data,
uInt  elementSize 
)

Construct a Sort object for the given data array with elements of elementSize bytes.

This data array will be used when an offset is given to the sortKey functions. You can still pass additional data arrays to the sortKey functions.

casacore::Sort::Sort ( const Sort )

Copy constructor (copy semantics).

casacore::Sort::~Sort ( )

Member Function Documentation

void casacore::Sort::addKey ( const void *  data,
DataType  ,
uInt  increment,
int  options 
)
private

Add a sort key giving a data type and stride or the sort key.

void casacore::Sort::addKey ( SortKey )
private
template<typename T >
int casacore::Sort::compare ( index1,
index2 
) const
private

Compare 2 records based on the comparison functions.

template<typename T >
int casacore::Sort::compareChangeIdx ( i1,
i2,
size_t &  idxComp 
) const
private

As compare() but it also gives back the index of the first comparison function that didn't match.

void casacore::Sort::copy ( const Sort that)
private

Copy that Sort object to this.

template<typename T >
T casacore::Sort::doSort ( Vector< T > &  indexVector,
nrrec,
int  options = DefaultSort,
Bool  tryGenSort = True 
) const
private
template<typename T >
T casacore::Sort::doUnique ( Vector< T > &  uniqueVector,
nrrec 
) const
private
template<typename T >
T casacore::Sort::doUnique ( Vector< T > &  uniqueVector,
const Vector< T > &  indexVector 
) const
private
template<typename T >
T casacore::Sort::doUnique ( Vector< T > &  uniqueVector,
Vector< size_t > &  changeKey,
const Vector< T > &  indexVector 
) const
private
template<typename T >
T casacore::Sort::heapSort ( nr,
T *  indices 
) const
private

Do a heapsort, optionally skipping duplicates.

template<typename T >
T casacore::Sort::heapSortNoDup ( nr,
T *  indices 
) const
private
template<typename T >
T casacore::Sort::insSort ( nr,
T *  indices 
) const
private

Do an insertion sort, optionally skipping duplicates.

template<typename T >
T casacore::Sort::insSortNoDup ( nr,
T *  indices 
) const
private
template<typename T >
void casacore::Sort::merge ( T *  inx,
T *  tmp,
size,
T *  index,
nparts 
) const
private
Sort& casacore::Sort::operator= ( const Sort )

Assignment (copy semantics).

template<typename T >
T casacore::Sort::parSort ( int  nthr,
nrrec,
T *  inx 
) const
private

Do a merge sort, if possible in parallel using OpenMP.

Note that the env.var. OMP_NUM_TRHEADS sets the maximum nr of threads to use. It defaults to the number of cores.

template<typename T >
void casacore::Sort::qkSort ( nr,
T *  indices 
) const
private
template<typename T >
T casacore::Sort::quickSort ( nr,
T *  indices 
) const
private

Do a quicksort, optionally skipping duplicates (qkSort is the actual quicksort function).

template<typename T >
T casacore::Sort::quickSortNoDup ( nr,
T *  indices 
) const
private
template<typename T >
void casacore::Sort::siftDown ( low,
up,
T *  indices 
) const
private

Siftdown algorithm for heapsort.

uInt casacore::Sort::sort ( Vector< uInt > &  indexVector,
uInt  nrrec,
int  options = DefaultSort,
Bool  tryGenSort = True 
) const

Sort the data array of nrrec records.

The result is an array of indices giving the requested order. It returns the number of resulting records. The indices array is resized to that number.
By default it'll try if the faster GenSortIndirect can be used if a sort on a single key is used.

uInt64 casacore::Sort::sort ( Vector< uInt64 > &  indexVector,
uInt64  nrrec,
int  options = DefaultSort,
Bool  tryGenSort = True 
) const
void casacore::Sort::sortKey ( const void *  data,
DataType  ,
uInt  increment = 0,
Order  = Ascending 
)

Define a sort key (the most significant key should be defined first).

The key contains:

  • A pointer to the start of the data array. — When structs are sorted on an element in the struct, the pointer must point to that element in the first struct.
  • A pointer to the comparison object to be used. — The comparison object can be specified in two ways:
    • by giving a basic data type, in which case the appropriate comparison object will be created automatically, or
    • by a CountedPtr of a comparison object. You may want to use the templated comparison classes ObjCompare (), but you are free to use any other class derived from BaseCompare that implements the comp function.
  • The increment from one data element to the next. — When structs are sorted on an element in the struct, the increment should be the size of the struct. If the comparison object is automatically created using the data type specified, the default increment is the size of the data type.
  • The sort order. — Ascending (default) or Descending;

When the data array has been passed to the Sort constructor, the data pointer and the increment arguments can be replaced by a single argument: the offset of the key in each element of the array.

void casacore::Sort::sortKey ( const void *  data,
const CountedPtr< BaseCompare > &  ,
uInt  increment,
Order  = Ascending 
)
void casacore::Sort::sortKey ( uInt  offset,
DataType  ,
Order  = Ascending 
)
void casacore::Sort::sortKey ( uInt  offset,
const CountedPtr< BaseCompare > &  ,
Order  = Ascending 
)
template<typename T >
void casacore::Sort::swap ( index1,
index2,
T *  indices 
) const
inlineprivate

Swap 2 indices.

Definition at line 438 of file Sort.h.

uInt casacore::Sort::unique ( Vector< uInt > &  uniqueVector,
uInt  nrrec 
) const

Get all unique records in a sorted array.

The array order is given in the indexVector (as possibly returned by the sort function). The default indexVector is 0..nrrec-1. The index of each first unique record is returned in the uniqueVector. They are indices in the supplied indexVector, so data[indexVector(uniqueVector(i))] is giving the i-th unique record. Note that the records indexed by indexVector(uniqueVector(i)) till indexVector(uniqueVector(i+1)) are all the same.
It returns the number of unique records. The unique array is resized to that number. The third version also gives back a vector with the keys that change in each sorting group. The size of changeKey is the same as uniqueVector, and for each unique sorting group indicates the index of the keyword that will change at the end of the group.

uInt casacore::Sort::unique ( Vector< uInt > &  uniqueVector,
const Vector< uInt > &  indexVector 
) const
uInt casacore::Sort::unique ( Vector< uInt > &  uniqueVector,
Vector< size_t > &  changeKey,
const Vector< uInt > &  indexVector 
) const
uInt64 casacore::Sort::unique ( Vector< uInt64 > &  uniqueVector,
uInt64  nrrec 
) const
uInt64 casacore::Sort::unique ( Vector< uInt64 > &  uniqueVector,
const Vector< uInt64 > &  indexVector 
) const
uInt64 casacore::Sort::unique ( Vector< uInt64 > &  uniqueVector,
Vector< size_t > &  changeKey,
const Vector< uInt64 > &  indexVector 
) const

Member Data Documentation

const void* casacore::Sort::data_p
private

Definition at line 448 of file Sort.h.

PtrBlock<SortKey*> casacore::Sort::keys_p
private

Definition at line 446 of file Sort.h.

size_t casacore::Sort::nrkey_p
private

Definition at line 447 of file Sort.h.

int casacore::Sort::order_p
private

Definition at line 450 of file Sort.h.

uInt casacore::Sort::size_p
private

Definition at line 449 of file Sort.h.


The documentation for this class was generated from the following file: