Regular Expression Overview#

In Xarpite, you can perform searches and extractions by combining strings with regular expression objects.

Regular expression objects are not a form of general objects, but are primitive values on par with numbers and strings.

Generating Regular Expression Objects#

Regular expression objects are generated by regular expression literals /pattern/flags.

There is also a function version REGEX.new.

Regular Expression Literals#

Regular expression literals are the most common way to generate regular expression objects.

The flags clause is optional and has the following forms:

  • /pattern/flags
  • /pattern/
$ xa ' /apple/ '
# /apple/

$ xa ' /apple/i '
# /apple/i

Pattern#

The pattern describes the regular expression pattern.

String Meaning
\\/ /
CRLF LF
CR LF
LF LF
\\ (1 char other than /) \\ followed by that 1 character
Other characters The character itself

You can generally describe regular expressions in their raw form.

Prohibition of Empty Patterns#

The pattern cannot be empty, and // is treated as a line comment.

Instead, write an equivalent alternative regular expression like /(?:)/.

Flags#

The flags describe a string composed of the following characters, such as i or mig:

  • m Multiline mode (^ and $ match line beginnings and endings)
  • i Ignore case
  • g Global search (global match)

Regular Expression Engine#

The pattern is actually passed to Kotlin’s String.toRegex and is interpreted by different regular expression engines on each platform.

REGEX.new Factory Function for Regular Expression Objects#

REGEX.new(pattern: STRING[; flags: STRING]): REGEX

Generates a regular expression object.

This function behaves exactly the same as the regular expression literal /pattern/flags.

$ xa 'REGEX.new("apple")'
# /apple/

$ xa 'REGEX.new("apple"; "i")'
# /apple/i

Property Access for Regular Expression Objects#

The following properties are available for regular expression objects:

  • pattern
  • flags
$ xa '/apple/i.pattern'
# apple

$ xa '/apple/i.flags'
# i

$ xa '/apple/.flags'
# NULL

Partial Match Determination#

The regex @ string operator can determine whether a regular expression partially matches a string.

$ xa ' /pp/ @ "apple" '
# TRUE

$ xa ' /xy/ @ "apple" '
# FALSE

Match Operator#

The string =~ regex operator performs regular expression matching on a string.

If matched, a non-NULL match result is returned; if not matched, NULL is returned.

In the match result, result.0 accesses the entire match, and result.1 and beyond access the strings of each matched group.

result[] converts to an array containing the entire match and each group.

$ xa -q '
  result := "apple" =~ /pp(.)/

  OUT << result.0
  OUT << result.1
  OUT << result[]
'
# ppl
# l
# [ppl;l]
$ xa -q '
  result := "apple" =~ /xy(.)/

  OUT << result
'
# NULL

Matching against a Stream#

When string is a stream, item =~ regex is applied to each element and the results are returned as a stream.

$ xa ' ("Red apple pie", "Yellow banana cake", "Pink cherry tart") =~ / ([a-z]+) / | _.1 '
# apple
# banana
# cherry

Global Match#

Matching with a regular expression with the g flag returns a stream of all match results.

$ xa ' "apple pebble people" =~ /(\w*pl\w*)/g | _.1 '
# apple
# people

$ xa ' ("Red apple pie", "Yellow banana cake", "Pink cherry tart") =~ /([A-Za-z]+)/g | _.1 '
# Red
# apple
# pie
# Yellow
# banana
# cake
# Pink
# cherry
# tart

Calling Regular Expression Objects as Functions#

When a regular expression object is called as a function, it behaves the same as the match operator.

$ xa ' /pp(.)/("apple").1 '
# l