Lrexlib provides bindings of the two principal regular expression library interfaces (POSIX and PCRE) to Lua 5.1.
Lrexlib builds into shared libraries called by default rex_posix.so and rex_pcre.so, which can be used with require.
Lrexlib is copyright Reuben Thomas 2000-2007 and copyright Shmuel Zeigerman 2004-2007, and is released under the MIT license.
Most functions and methods in Lrexlib have mandatory and optional arguments. There are no dependencies between arguments in Lrexlib's functions and methods. Any optional argument can be supplied as nil (or omitted if it is trailing one), the library will then use the default value for that argument.
This document uses the following syntax for optional arguments: they are bracketed separately, and commas are left outside brackets, e.g.:
MyFunc (arg1, arg2, [arg3], [arg4])
Throughout this document, the identifier rex is used in place of either rex_posix or rex_pcre, that are the default namespaces for the corresponding libraries.
All functions receiving a regular expression pattern as an argument will generate an error if that pattern is found invalid by the used POSIX / PCRE library.
The default value for compilation flags (cf) that Lrexlib uses when the parameter is not supplied or nil, is:
- 0 for PCRE
- REG_EXTENDED for POSIX regex library
The default value for execution flags (ef) that Lrexlib uses when the parameter is not supplied or nil, is:
- 0 for PCRE
- 0 for standard POSIX regex library
- REG_STARTEND for those POSIX regex libraries that support it, e.g. Spencer's.
rex.match (subj, patt, [init], [cf], [ef], [lo])
The function searches for the first match of the regexp patt in the string subj, starting from offset init, subject to flags cf and ef.
PCRE: A locale lo may be specified.
Parameter Description Type Default Value subj subject string n/a patt regular expression pattern string n/a [init] start offset in the subject (can be negative) number 1 [cf] compilation flags (bitwise OR) number cf [ef] execution flags (bitwise OR) number ef [lo] [PCRE] locale string nil
rex.find (subj, patt, [init], [cf], [ef], [lo])
The function searches for the first match of the regexp patt in the string subj, starting from offset init, subject to flags cf and ef.
PCRE: A locale lo may be specified.
Parameter Description Type Default Value subj subject string n/a patt regular expression pattern string n/a [init] start offset in the subject (can be negative) number 1 [cf] compilation flags (bitwise OR) number cf [ef] execution flags (bitwise OR) number ef [lo] [PCRE] locale string nil
rex.gmatch (subj, patt, [cf], [ef], [lo])
The function is intended for use in the generic for Lua construct. It returns an iterator for repeated matching of the pattern patt in the string subj, subject to flags cf and ef.
PCRE: A locale lo may be specified.
Parameter Description Type Default Value subj subject string n/a patt regular expression pattern string n/a [cf] compilation flags (bitwise OR) number cf [ef] execution flags (bitwise OR) number ef [lo] [PCRE] locale string nil
The iterator function is called by Lua. On every iteration (that is, on every match), it returns all captures in the order they appear in the pattern (or the entire match if the pattern specified no captures). The iteration will continue till the subject fails to match.
rex.gsub (subj, patt, repl, [n], [cf], [ef], [lo])
The function searches for all matches of the pattern patt in the string subj and substitutes the found matches according to the parameter repl (see details below).
PCRE: A locale lo may be specified.
Parameter Description Type Default Value subj subject string n/a patt regular expression pattern string n/a repl substitution source string, function or table n/a [n] maximum number of matches to search for; unlimited if not supplied number nil [cf] compilation flags (bitwise OR) number cf [ef] execution flags (bitwise OR) number ef [lo] [PCRE] locale string nil
The parameter repl can be either a string, a function or a table. The function behaves differently depending on the repl type:
- if X represents a digit, then each %X occurence is substituted by the value of the X-th submatch (capture), with the following cases handled specially:
- each %0 is substituted by the entire match
- if the pattern contains no captures, then each %1 is substituted by the entire match
- any other %X where X is greater than the number of captures in the pattern will generate an error ("invalid capture index")
- if the pattern does contain a capture with number X but that capture didn't participate in the match, then %X is substituted by an empty string
- if X is any non-digit character then %X is substituted by X
- all parts of repl other than %X are copied to the output string verbatim.
- if it is a string then it is used as a substitution for the current match.
- if it is either of nothing, nil or false then no substitution is made.
- values of other types generate an error.
Though gsub is in general consistent with the API and behavior of Lua's string.gsub, it has one extension with regards to string.gsub behavior:
- if function repl returns more than one value and its second return value is the literal string "break", then gsub stops searching for further matches in the subject and returns.
- If no value is stored under the key but repl has a metatable with the __index field set then the correspondent metamethod will be called for obtaining the value.
- The obtained value is used for the substitution following exactly same rules as for the first return value of repl described in the above paragraph.
rex.split (subj, sep, [cf], [ef], [lo])
The function is intended for use in the generic for Lua construct. It is used for splitting a subject string subj into parts (sections). The sep parameter is a regular expression pattern representing separators between the sections.
The function returns an iterator for repeated matching of the pattern sep in the string subj, subject to flags cf and ef.
PCRE: A locale lo may be specified.
Parameter Description Type Default Value subj subject string n/a sep separator (regular expression pattern) string n/a [cf] compilation flags (bitwise OR) number cf [ef] execution flags (bitwise OR) number ef [lo] [PCRE] locale string nil
On every iteration pass, the iterator returns:
- A subject section (can be an empty string), followed by
- All captures in the order they appear in the sep pattern (or the entire match if the sep pattern specified no captures). If there is no match (this can occur only in the last iteration), then nothing is returned after the subject section.
The iteration will continue till the end of the subject. Unlike gmatch, there will always be at least one iteration pass, even if there's no matches in the subject.
rex.plainfind (subj, patt, [init], [ci])
The function searches for the first match of the string patt in the subject subj, starting from offset init.
- The string patt is not regular expression, all its characters stand for themselves.
- Both strings subj and patt can have embedded zeros.
- The flag ci specifies case-insensitive search (current locale is used).
- This function uses neither PCRE nor POSIX regex library.
Parameter Description Type Default Value subj subject string n/a patt text to find string n/a [init] start offset in the subject (can be negative) number 1 [ci] case insensitive search boolean false
rex.new (patt, [cf], [lo])
The functions compiles regular expression patt into a regular expression object whose internal representation is correspondent to the library used (PCRE or POSIX regex). The returned result then can be used by the methods tfind, exec and dfa_exec. Regular expression objects are automatically garbage collected.
PCRE: A locale lo may be specified.
Parameter Description Type Default Value patt regular expression pattern string n/a [cf] compilation flags (bitwise OR) number cf [lo] [PCRE] locale string nil
rex.flags ([tb])
This function returns a table containing numeric values of the constants defined by the used regex library (either PCRE or POSIX). Those constants are keyed by their names (strings). If the table argument tb is supplied then it is used as the output table, else a new table is created.
The constants contained in the returned table can then be used in most functions and methods where compilation flags or execution flags can be specified. They can also be used for comparing with return codes of some functions and methods for determining the reason of failure. For details, see PCRE and POSIX documentation.
Parameter Description Type Default Value [tb] a table for placing results into table nil
[PCRE only. See pcre_config in the PCRE docs.]
rex.config ([tb])
This function returns a table containing the values of the configuration parameters used at PCRE library build-time. Those parameters (numbers) are keyed by their names (strings). If the table argument tb is supplied then it is used as the output table, else a new table is created.
Parameter Description Type Default Value [tb] a table for placing results into table nil
r:tfind (subj, [init], [ef])
The method searches for the first match of the compiled regexp r in the string subj, starting from offset init, subject to execution flags ef.
Parameter Description Type Default Value r regex object produced by new userdata n/a subj subject string n/a [init] start offset in the subject (can be negative) number 1 [ef] execution flags (bitwise OR) number ef
r:exec (subj, [init], [ef])
The method searches for the first match of the compiled regexp r in the string subj, starting from offset init, subject to execution flags ef.
Parameter Description Type Default Value r regex object produced by new userdata n/a subj subject string n/a [init] start offset in the subject (can be negative) number 1 [ef] execution flags (bitwise OR) number ef
[PCRE 6.0 and later. See pcre_dfa_exec in the PCRE docs.]
r:dfa_exec (subj, [init], [ef], [ovecsize], [wscount])
The method matches a compiled regular expression r against a given subject string subj, using a DFA matching algorithm.
Parameter Description Type Default Value r regex object produced by new userdata n/a subj subject string n/a [init] start offset in the subject (can be negative) number 1 [ef] execution flags (bitwise OR) number ef [ovecsize] size of the array for result offsets number 100 [wscount] number of elements in the working space array number 50
The following changes are incompatible with Lrexlib version 1.19:
- Lua 5.1 is required
- Functions newPCRE and newPOSIX renamed to new
- Functions flagsPCRE and flagsPOSIX renamed to flags
- Function versionPCRE renamed to version
- Method match renamed to tfind
- Method gmatch removed (similar functionality is provided by function gmatch)
- Method exec: the returned table may additionally contain named subpatterns (PCRE only)
The following changes are incompatible with Lrexlib version 2.0: