casacore
|
Regular expression class (based on std::regex) More...
#include <Regex.h>
Public Member Functions | |
Regex () | |
Default constructor uses a zero-length regular expression. More... | |
Regex (const String &exp, Bool fast=False, Bool toECMAScript=True) | |
Construct a regular expression from the string. More... | |
void | operator= (const String &str) |
Construct a new regex (using the default Regex constructor arguments). More... | |
const String & | regexp () const |
Get the regular expression string. More... | |
String::size_type | match (const Char *s, String::size_type len, String::size_type pos=0) const |
Test if the regular expression matches (first part of) string s . More... | |
Bool | fullMatch (const Char *s, String::size_type len) const |
Test if the regular expression matches the entire string. More... | |
String::size_type | search (const Char *s, String::size_type len, Int &matchlen, Int pos=0) const |
Test if the regular expression occurs anywhere in string s . More... | |
String::size_type | find (const Char *s, String::size_type len, Int &matchlen, String::size_type pos=0) const |
String::size_type | searchBack (const Char *s, String::size_type len, Int &matchlen, uInt pos) const |
Search backwards. More... | |
Static Public Member Functions | |
static String | toEcma (const String &rx) |
Convert the possibly old-style regex to the Ecma regex which means that unescaped [ and ] inside a bracket expression will be escaped and that a numeric character after a backreference is enclosed in brackets (otherwise the backreference uses multiple characters). More... | |
static String | fromPattern (const String &pattern) |
Convert a shell-like pattern to a regular expression string. More... | |
static String | fromSQLPattern (const String &pattern) |
Convert an SQL-like pattern to a regular expression string. More... | |
static String | fromString (const String &str) |
Convert a normal string to a regular expression string. More... | |
static String | makeCaseInsensitive (const String &str) |
Create a case-insensitive regular expression string from the given regular expression string. More... | |
Protected Attributes | |
String | itsStr |
Friends | |
ostream & | operator<< (ostream &ios, const Regex &exp) |
Write the regex string. More... | |
Regular expression class (based on std::regex)
Public interface
This class provides regular expression functionality, such as matching and searching in strings, comparison of expressions, and input/output. It is built on the standard C++ regular expression class using the ECMAScript syntax. It is almost the same as the regular expression syntax used until March 2019 which used GNU's cregex.cc. ECMAScript offers more functionality (such as non-greedy matching), but there is a slight difference how brackets are used. In the old regex they did not need to be escaped, while they have to for ECMAScript. Furthermore, in the old Regex up to 9 backreferences could be given, so \15 meant the first backreference followed by a 5. In ECMAScript it means the 15th and parentheses are needed to get the old meaning. These differences are solved in the Regex constructor which adds escape characters as needed. Thus existing code using Regex does not need to be changed.
Apart from proper regular expressions, it also supports glob patterns (UNIX file name patterns) by means of a conversion to a proper regex string. Also ordinary strings and SQL-style patterns can be converted to a proper regex string.
See http://www.cplusplus.com/reference/regex/ECMAScript for the syntax.
[abc]
. A hyphen can be used for a character range; e.g. [a-z]
. [^abc]
means any character but a, b, and c. If ^ is not the first character, it is a literal caret. If - is the last character, it is a literal hyphen. If ] is the first character, it is a literal closing bracket. [^[:upper:]]
is equal to [^A-Z]
. Note that [:upper:] is more portable, because A-Z fails for the EBCDIC character set. Special characters have to be escaped with a backslash to use them literally. Only inside the square brackets, escaping should not be done. See the man page of egrep or regexp for more information about regular expressions.
Several global Regex objects are predefined for common functionality.
The static member function fromPattern
converts a shell-like pattern to a String which can be used to create a Regex from it. A pattern has the following special characters:
{abc,defg}
means abc
or defg
. Brace expressions can be nested and can contain other special characters. The static member function fromSQLPattern
converts an SQL-like pattern to a String which can be used to create a Regex from it. A pattern has the following special characters:
The static member function fromString
converts a normal string to a regular expression. This function escapes characters in the string which are special in a regular expression. In this way a normal string can be passed to a function taking a regular expression.
The static member function makeCaseInsensitive
returns a new regular expression string containing the case-insensitive version of the given expression string.
In RXdouble the. is escaped via a backslash to get it literally. The second backslash is needed to escape the backslash in C++.
casacore::Regex::Regex | ( | ) |
Default constructor uses a zero-length regular expression.
Construct a regular expression from the string.
If toECMAScript=True, function toEcma is called to convert the old cregex syntax to the new ECMAScript syntax. If fast=True, matching efficiency is preferred over efficiency constructing the regex object.
String::size_type casacore::Regex::find | ( | const Char * | s, |
String::size_type | len, | ||
Int & | matchlen, | ||
String::size_type | pos = 0 |
||
) | const |
Convert a shell-like pattern to a regular expression string.
This is useful for people who are more familiar with patterns than with regular expressions.
Convert an SQL-like pattern to a regular expression string.
This is useful TaQL which mimics SQL.
Convert a normal string to a regular expression string.
This consists of escaping the special characters. This is useful when one wants to provide a normal string (which may contain special characters) to a function working on regular expressions.
Bool casacore::Regex::fullMatch | ( | const Char * | s, |
String::size_type | len | ||
) | const |
Test if the regular expression matches the entire string.
Create a case-insensitive regular expression string from the given regular expression string.
It does it by inserting the lowercase and uppercase version of characters in the input string into the output string.
String::size_type casacore::Regex::match | ( | const Char * | s, |
String::size_type | len, | ||
String::size_type | pos = 0 |
||
) | const |
Test if the regular expression matches (first part of) string s
.
The return value gives the length of the matching string part, or String::npos if there is no match or an error. The string has len
characters and the test starts at position pos
. The string may contain null characters. Negative p is allowed to define the start from the end.
Tip: Use the appropriate String functions to test if a string matches a regular expression; Regex::match
is pretty low-level;
void casacore::Regex::operator= | ( | const String & | str | ) |
Construct a new regex (using the default Regex constructor arguments).
|
inline |
Get the regular expression string.
Definition at line 251 of file Regex.h.
References itsStr.
Referenced by casacore::TaqlRegex::match().
String::size_type casacore::Regex::search | ( | const Char * | s, |
String::size_type | len, | ||
Int & | matchlen, | ||
Int | pos = 0 |
||
) | const |
Test if the regular expression occurs anywhere in string s
.
The return value gives the position of the first substring matching the regular expression. The length of that substring is returned in matchlen
. The string has len
characters and the test starts at position pos
. The string may contain null characters. If the pos given is negative, the search starts -pos from the end.
Tip: Use the appropriate String functions to test if a regular expression occurs in a string; Regex::search
is pretty low-level;
String::size_type casacore::Regex::searchBack | ( | const Char * | s, |
String::size_type | len, | ||
Int & | matchlen, | ||
uInt | pos | ||
) | const |
Search backwards.
Convert the possibly old-style regex to the Ecma regex which means that unescaped [ and ] inside a bracket expression will be escaped and that a numeric character after a backreference is enclosed in brackets (otherwise the backreference uses multiple characters).
|
friend |
Write the regex string.
|
protected |