Regular Expression Overview#
In Xarpite, you can perform searches and extractions by combining strings with regular expression objects.
Regular expression objects are not a form of general objects, but are primitive values on par with numbers and strings.
Generating Regular Expression Objects#
Regular expression objects are generated by regular expression literals /pattern/flags.
There is also a function version REGEX.new.
Regular Expression Literals#
Regular expression literals are the most common way to generate regular expression objects.
The flags clause is optional and has the following forms:
/pattern/flags/pattern/
$ xa ' /apple/ '
# /apple/
$ xa ' /apple/i '
# /apple/i
Pattern#
The pattern describes the regular expression pattern.
| String | Meaning |
|---|---|
\\/ |
/ |
| CRLF | LF |
| CR | LF |
| LF | LF |
\\ (1 char other than /) |
\\ followed by that 1 character |
| Other characters | The character itself |
You can generally describe regular expressions in their raw form.
Prohibition of Empty Patterns#
The pattern cannot be empty, and // is treated as a line comment.
Instead, write an equivalent alternative regular expression like /(?:)/.
Flags#
The flags describe a string composed of the following characters, such as i or mig:
mMultiline mode (^and$match line beginnings and endings)iIgnore casegGlobal search (global match)
Regular Expression Engine#
The pattern is actually passed to Kotlin’s String.toRegex and is interpreted by different regular expression engines on each platform.
REGEX.new Factory Function for Regular Expression Objects#
REGEX.new(pattern: STRING[; flags: STRING]): REGEX
Generates a regular expression object.
This function behaves exactly the same as the regular expression literal /pattern/flags.
$ xa 'REGEX.new("apple")'
# /apple/
$ xa 'REGEX.new("apple"; "i")'
# /apple/i
Property Access for Regular Expression Objects#
The following properties are available for regular expression objects:
patternflags
$ xa '/apple/i.pattern'
# apple
$ xa '/apple/i.flags'
# i
$ xa '/apple/.flags'
# NULL
Partial Match Determination#
The regex @ string operator can determine whether a regular expression partially matches a string.
$ xa ' /pp/ @ "apple" '
# TRUE
$ xa ' /xy/ @ "apple" '
# FALSE
Match Operator#
The string =~ regex operator performs regular expression matching on a string.
If matched, a non-NULL match result is returned; if not matched, NULL is returned.
In the match result, result.0 accesses the entire match, and result.1 and beyond access the strings of each matched group.
result[] converts to an array containing the entire match and each group.
$ xa -q '
result := "apple" =~ /pp(.)/
OUT << result.0
OUT << result.1
OUT << result[]
'
# ppl
# l
# [ppl;l]
$ xa -q '
result := "apple" =~ /xy(.)/
OUT << result
'
# NULL
Matching against a Stream#
When string is a stream, item =~ regex is applied to each element and the results are returned as a stream.
$ xa ' ("Red apple pie", "Yellow banana cake", "Pink cherry tart") =~ / ([a-z]+) / | _.1 '
# apple
# banana
# cherry
Global Match#
Matching with a regular expression with the g flag returns a stream of all match results.
$ xa ' "apple pebble people" =~ /(\w*pl\w*)/g | _.1 '
# apple
# people
$ xa ' ("Red apple pie", "Yellow banana cake", "Pink cherry tart") =~ /([A-Za-z]+)/g | _.1 '
# Red
# apple
# pie
# Yellow
# banana
# cake
# Pink
# cherry
# tart
Calling Regular Expression Objects as Functions#
When a regular expression object is called as a function, it behaves the same as the match operator.
$ xa ' /pp(.)/("apple").1 '
# l