casacore
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Public Member Functions | Static Public Member Functions | Protected Attributes | Friends | List of all members
casacore::Regex Class Reference

Regular expression class (based on std::regex) More...

#include <Regex.h>

Inheritance diagram for casacore::Regex:

Public Member Functions

 Regex ()
 Default constructor uses a zero-length regular expression. More...
 
 Regex (const String &exp, Bool fast=False, Bool toECMAScript=True)
 Construct a regular expression from the string. More...
 
void operator= (const String &str)
 Construct a new regex (using the default Regex constructor arguments). More...
 
const Stringregexp () const
 Get the regular expression string. More...
 
String::size_type match (const Char *s, String::size_type len, String::size_type pos=0) const
 Test if the regular expression matches (first part of) string s. More...
 
Bool fullMatch (const Char *s, String::size_type len) const
 Test if the regular expression matches the entire string. More...
 
String::size_type search (const Char *s, String::size_type len, Int &matchlen, Int pos=0) const
 Test if the regular expression occurs anywhere in string s. More...
 
String::size_type find (const Char *s, String::size_type len, Int &matchlen, String::size_type pos=0) const
 
String::size_type searchBack (const Char *s, String::size_type len, Int &matchlen, uInt pos) const
 Search backwards. More...
 

Static Public Member Functions

static String toEcma (const String &rx)
 Convert the possibly old-style regex to the Ecma regex which means that unescaped [ and ] inside a bracket expression will be escaped and that a numeric character after a backreference is enclosed in brackets (otherwise the backreference uses multiple characters). More...
 
static String fromPattern (const String &pattern)
 Convert a shell-like pattern to a regular expression string. More...
 
static String fromSQLPattern (const String &pattern)
 Convert an SQL-like pattern to a regular expression string. More...
 
static String fromString (const String &str)
 Convert a normal string to a regular expression string. More...
 
static String makeCaseInsensitive (const String &str)
 Create a case-insensitive regular expression string from the given regular expression string. More...
 

Protected Attributes

String itsStr
 

Friends

ostream & operator<< (ostream &ios, const Regex &exp)
 Write the regex string. More...
 

Detailed Description

Regular expression class (based on std::regex)

Intended use:

Public interface

Review Status

Reviewed By:
Friso Olnon
Date Reviewed:
1995/03/20
Test programs:
tRegex

Synopsis

This class provides regular expression functionality, such as matching and searching in strings, comparison of expressions, and input/output. It is built on the standard C++ regular expression class using the ECMAScript syntax. It is almost the same as the regular expression syntax used until March 2019 which used GNU's cregex.cc. ECMAScript offers more functionality (such as non-greedy matching), but there is a slight difference how brackets are used. In the old regex they did not need to be escaped, while they have to for ECMAScript. Furthermore, in the old Regex up to 9 backreferences could be given, so \15 meant the first backreference followed by a 5. In ECMAScript it means the 15th and parentheses are needed to get the old meaning. These differences are solved in the Regex constructor which adds escape characters as needed. Thus existing code using Regex does not need to be changed.

Apart from proper regular expressions, it also supports glob patterns (UNIX file name patterns) by means of a conversion to a proper regex string. Also ordinary strings and SQL-style patterns can be converted to a proper regex string.

See http://www.cplusplus.com/reference/regex/ECMAScript for the syntax.

^
matches the beginning of a line.
$
matches the end of a line.
.
matches any character
*
zero or more times the previous subexpression.
+
one or more times the previous subexpression.
?
zero or one time the previous subexpression.
{n,m}
interval operator to specify how many times a subexpression can match. See man page of egrep or regexp for more detail.
[]
matches any character inside the brackets; e.g. [abc]. A hyphen can be used for a character range; e.g. [a-z].
A ^ right after the opening bracket indicates "not"; e.g. [^abc] means any character but a, b, and c. If ^ is not the first character, it is a literal caret. If - is the last character, it is a literal hyphen. If ] is the first character, it is a literal closing bracket.
Special character classes are [:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:], [:space:], [:print:], [:punct:], [:graph:], and [:cntrl:]. The brackets are part of the name; e.g. [^[:upper:]] is equal to [^A-Z]. Note that [:upper:] is more portable, because A-Z fails for the EBCDIC character set.
()
grouping to change the normal operator precedence.
|
or operator. Matches left side or right side.
\1 till \9. Backreference to a subexpression. Matches part of string equal to string part that matched the subexpression.

Special characters have to be escaped with a backslash to use them literally. Only inside the square brackets, escaping should not be done. See the man page of egrep or regexp for more information about regular expressions.

Several global Regex objects are predefined for common functionality.

RXwhite
one or more whitespace characters
RXint
integer number (also negative)
RXdouble
double number (with e or E as exponent)
RXalpha
one or more alphabetic characters (lowercase and/or uppercase)
RXlowercase
lowercase alphabetic
RXuppercase
uppercase alphabetic
RXalphanum
one or more alphabetic/numeric characters (lowercase and/or uppercase)
RXidentifier
identifier name (first alphabetic or underscore, then zero or more alphanumeric and/or underscores

The static member function fromPattern converts a shell-like pattern to a String which can be used to create a Regex from it. A pattern has the following special characters:

*
Zero or more arbitrary characters.
?
One arbitrary character
[]
The same as [] in a regular expression (see above). In addition to ^ a ! can be used to indicate "not".
{,}
A brace expression which is like brace expansion in some shells. It is similar to the | construct in a regular expression.
E.g. {abc,defg} means abc or defg. Brace expressions can be nested and can contain other special characters.
E.g. St{Man*.{h,cc},Col?*.{h,cc,l,y}}
A literal comma or brace in a brace expression can be given by escaping it with a backslash.

The static member function fromSQLPattern converts an SQL-like pattern to a String which can be used to create a Regex from it. A pattern has the following special characters:

%
Zero or more arbitrary characters.
_
One arbitrary character

The static member function fromString converts a normal string to a regular expression. This function escapes characters in the string which are special in a regular expression. In this way a normal string can be passed to a function taking a regular expression.

The static member function makeCaseInsensitive returns a new regular expression string containing the case-insensitive version of the given expression string.

Example

Regex RXwhite("[ \n\t\r\v\f]+");
(blank, newline, tab, return, vertical tab, formfeed)
Regex RXint("[-+]?[0-9]+");
Regex RXdouble("[-+]?(([0-9]+\\.[0-9]*)|([0-9]+)|(\\.[0-9]+))([eE][+-]?[0-9]+)?");
Regex RXalpha("[A-Za-z]+");
Regex RXlowercase("[a-z]+");
Regex RXuppercase("[A-Z]+");
Regex RXalphanum("[0-9A-Za-z]+");
Regex RXidentifier("[A-Za-z_][A-Za-z0-9_]*");

In RXdouble the. is escaped via a backslash to get it literally. The second backslash is needed to escape the backslash in C++.

Regex rx1 (Regex::fromPattern ("St*.{h,cc}");
results in regexp "St.*\.((h)|(cc))"
Regex rx2 (Regex::fromString ("tRegex.cc");
results in regexp "tRegex\.cc"

Definition at line 206 of file Regex.h.

Constructor & Destructor Documentation

casacore::Regex::Regex ( )

Default constructor uses a zero-length regular expression.

casacore::Regex::Regex ( const String exp,
Bool  fast = False,
Bool  toECMAScript = True 
)
explicit

Construct a regular expression from the string.

If toECMAScript=True, function toEcma is called to convert the old cregex syntax to the new ECMAScript syntax. If fast=True, matching efficiency is preferred over efficiency constructing the regex object.

Member Function Documentation

String::size_type casacore::Regex::find ( const Char s,
String::size_type  len,
Int matchlen,
String::size_type  pos = 0 
) const
static String casacore::Regex::fromPattern ( const String pattern)
static

Convert a shell-like pattern to a regular expression string.

This is useful for people who are more familiar with patterns than with regular expressions.

static String casacore::Regex::fromSQLPattern ( const String pattern)
static

Convert an SQL-like pattern to a regular expression string.

This is useful TaQL which mimics SQL.

static String casacore::Regex::fromString ( const String str)
static

Convert a normal string to a regular expression string.

This consists of escaping the special characters. This is useful when one wants to provide a normal string (which may contain special characters) to a function working on regular expressions.

Bool casacore::Regex::fullMatch ( const Char s,
String::size_type  len 
) const

Test if the regular expression matches the entire string.

static String casacore::Regex::makeCaseInsensitive ( const String str)
static

Create a case-insensitive regular expression string from the given regular expression string.

It does it by inserting the lowercase and uppercase version of characters in the input string into the output string.

String::size_type casacore::Regex::match ( const Char s,
String::size_type  len,
String::size_type  pos = 0 
) const

Test if the regular expression matches (first part of) string s.

The return value gives the length of the matching string part, or String::npos if there is no match or an error. The string has len characters and the test starts at position pos. The string may contain null characters. Negative p is allowed to define the start from the end.


Tip: Use the appropriate String functions to test if a string matches a regular expression; Regex::match is pretty low-level;

void casacore::Regex::operator= ( const String str)

Construct a new regex (using the default Regex constructor arguments).

const String& casacore::Regex::regexp ( ) const
inline

Get the regular expression string.

Definition at line 251 of file Regex.h.

References itsStr.

Referenced by casacore::TaqlRegex::match().

String::size_type casacore::Regex::search ( const Char s,
String::size_type  len,
Int matchlen,
Int  pos = 0 
) const

Test if the regular expression occurs anywhere in string s.

The return value gives the position of the first substring matching the regular expression. The length of that substring is returned in matchlen. The string has len characters and the test starts at position pos. The string may contain null characters. If the pos given is negative, the search starts -pos from the end.
Tip: Use the appropriate String functions to test if a regular expression occurs in a string; Regex::search is pretty low-level;

String::size_type casacore::Regex::searchBack ( const Char s,
String::size_type  len,
Int matchlen,
uInt  pos 
) const

Search backwards.

static String casacore::Regex::toEcma ( const String rx)
static

Convert the possibly old-style regex to the Ecma regex which means that unescaped [ and ] inside a bracket expression will be escaped and that a numeric character after a backreference is enclosed in brackets (otherwise the backreference uses multiple characters).

Friends And Related Function Documentation

ostream& operator<< ( ostream &  ios,
const Regex exp 
)
friend

Write the regex string.

Member Data Documentation

String casacore::Regex::itsStr
protected

Definition at line 304 of file Regex.h.

Referenced by regexp().


The documentation for this class was generated from the following file: