Interested in improving this site? Please check the To Do page.
Table of Contents
Regular Expression (re) Verbs
- re.compile: Compiles pattern into internal byte-code representation. New verb.
- re.expand: Replaces any reference to a numbered or named capturing parenthesis with the value specified for it in the match info table. New verb.
- re.extract: Extracts all matches of a pattern in the target string into a list. New verb.
- re.getPatternInfo: Compiles pattern into the internal byte-code representation for a regular expression, taking the specified options into account. New verb.
- re.grep: Loop over all lines or elements of a string and attempt to match a compiled pattern against every single one. New verb.
- re.join: Joins all items in a string list into a delimited string object, using a specified delimiter between items. New verb.
- re.match: Attempts to match the compiled pattern against successive character positions in the target string. New verb.
- re.replace: Attempts to match the compiled pattern against successive character positions in the target string, replacing each match. New verb.
- re.split: Attempts to match the compiled pattern against successive character positions in the target string, using the pattern as a delimiter to split the target string into fields. New verb.
- re.visit: Traverses all matches in the target string and calls the callback script for each match. New verb.
Regular Expression Syntax
This section describes the regular expression syntax used by the re kernel verbs, added in Frontier 9.1b3. The kernel implementation of the re verbs is based on the PCRE library v4.2. The full syntax for regular expressions is documented here. Exceptions are:
- If you specify a regular expression as a literal string in a UserTalk script, you will have to double all backslashes. Any single backslashes in a literal string will be interpreted by Frontier (and removed) before the string is actually passed to the re.compile kernel verb.
- All aspects of memory management are handled by Frontier, so you can safely ignore all mentions of pcre_malloc and pcre_free in the text.
- All error messages received from the PCRE library are converted to comply with the conventions for UserTalk error messages. Messages about syntax errors will usually include a reference to the position of the offending character.
- If pcre_exec is mentioned, it can usually be assumed that what is said applies to all re verbs that take a compiled pattern as their first argument.
- Any information about matched subpatterns is passed back to the caller in a so-called MatchInfo table. (The scripter does not have direct access to the ovector sometimes mentioned in conjunction with pcre_exec.)
- High-ascii characters as used in several European languages are not recognized as "word" characters, i.e. they are not matched by \w.
- Support for UTF-8 strings has not been enabled.
- Support for callouts has not been implemented.
- The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_EXTRA, and PCRE_DOLLAR_ENDOLY options are not supported. All other options are accessible via the boolean parameters of re.compile.
Resources
- The source code for the PCRE library is available for download from ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/