Step 2: Combinators
Learn to combine parsers using sequences, choices, repetition, and more to build complex grammars.
Core Combinators
Choice with +
Try alternatives in order. The first match wins:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val keyword = (+"if" + +"while" + +"for") named "keyword"
fun main() {
keyword.parseAll("if").getOrThrow() // ✓ matches "if"
keyword.parseAll("while").getOrThrow() // ✓ matches "while"
}
Optional Parsing
optional attempts to match but rewinds on failure. Returns Tuple1<T?>:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val sign = (+'+' map { '+' }) + (+'-' map { '-' })
val signOpt = sign.optional map { it.a ?: '+' }
val unsigned = +Regex("[0-9]+") map { it.value.toInt() } named "number"
val signedInt = signOpt * unsigned map { (s, value) ->
if (s == '-') -value else value
}
fun main() {
check(signedInt.parseAll("-42").getOrThrow() == -42)
check(signedInt.parseAll("99").getOrThrow() == 99)
}
Use it.a to access the optional value, or destructure with map { (value) -> ... }.
Combining Optionals with Tuples
When combining multiple optional parsers using *, tuples are automatically flattened to contain nullable values directly:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val optA = (+'a').optional
val optB = (+'b').optional
val combined = optA * optB
fun main() {
// Result type is Tuple2<Char?, Char?> (flattened)
// NOT Tuple2<Tuple1<Char?>, Tuple1<Char?>> (nested)
val result1 = combined.parseAll("ab").getOrThrow()
check(result1.a == 'a') // Direct access to nullable Char
check(result1.b == 'b')
val result2 = combined.parseAll("a").getOrThrow()
check(result2.a == 'a')
check(result2.b == null) // Missing optional is null
}
This flattening makes optional combinations more ergonomic—you work with nullable types directly instead of nested tuples.
Repetition
Collect multiple matches into a list:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val digits = (+Regex("[0-9]") map { it.value } named "digit").oneOrMore map { matches ->
matches.joinToString("")
}
val letters = (+Regex("[a-z]") map { it.value } named "letter").zeroOrMore map { matches ->
matches
}
fun main() {
digits.parseAll("123").getOrThrow() // => "123"
letters.parseAll("abc").getOrThrow() // => ["a", "b", "c"]
letters.parseAll("").getOrThrow() // => []
}
.zeroOrMore- Matches zero or more times (never fails).oneOrMore- Matches one or more times (fails if no match).list(min, max)- Matches betweenminandmaxtimes
Serial Parsing
When you need to parse multiple different parsers of the same type in sequence without tuple limits, use serial:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val article = +"the" + +"a"
val adjective = +"quick" + +"lazy"
val noun = +"fox" + +"dog"
val phrase = serial(article, +" ", adjective, +" ", noun)
fun main() {
check(phrase.parseAll("the quick fox").getOrThrow() == listOf("the", " ", "quick", " ", "fox"))
check(phrase.parseAll("a lazy dog").getOrThrow() == listOf("a", " ", "lazy", " ", "dog"))
}
serial returns a List<T> and has no theoretical upper limit, unlike tuple parsers which are limited to 16 elements. Use it when:
- You have many parsers to combine (especially beyond tuple limits)
- You need a long natural language phrase with selectable parts
- You want a list result instead of a tuple
For repeating the same parser, use .list() or .oneOrMore instead.
Shaping Results
Sequences with * return tuples. Use -parser to drop unneeded values:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
// Without dropping: Tuple3<Char, MatchResult, Char>
val word = +Regex("[a-z]+") named "word"
val withDelimiters = +'(' * word * +')'
// With dropping: MatchResult (just the middle value)
val cleanResult = -'(' * word * -')' map { it.value }
fun main() {
cleanResult.parseAll("(hello)").getOrThrow() // => "hello"
}
Destructure tuples in map to transform results:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val wordPart = +Regex("[a-z]+") named "word"
val numPart = +Regex("[0-9]+") named "number"
val pair = wordPart * -',' * numPart map { (word, num) ->
word.value to num.value.toInt()
}
fun main() {
pair.parseAll("hello,42").getOrThrow() // => ("hello", 42)
}
Input Boundaries
startOfInput and endOfInput match at position boundaries without consuming input:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val word = +Regex("[a-z]+") map { it.value } named "word"
fun main() {
// Matches at start of input
val atStart = (startOfInput * word).parseAll("hello").getOrThrow()
check(atStart == "hello") // Succeeds
}
Note: When using parseAll(...).getOrThrow(), boundary checks are redundant—it already verifies the entire input is consumed. Use these parsers with parseOrNull or within sub-grammars.
Naming Parsers
Assign names for better error messages:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val digit = +Regex("[0-9]") named "digit"
val letter = +Regex("[a-z]") named "letter"
val identifier = (letter * (letter + digit).zeroOrMore) named "identifier"
fun main() {
val result = identifier.parseAll("123abc")
val exception = result.exceptionOrNull() as? ParseException
check(exception != null) // Parsing fails
check(exception.message!!.contains("Syntax Error"))
}
Named Composite Parsers
Named composite parsers hide constituent parsers from error suggestions:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
fun main() {
val parserA = +'a' named "letter_a"
val parserB = +'b' named "letter_b"
// Named composite: only "ab_sequence" in errors
val namedComposite = (parserA * parserB) named "ab_sequence"
// Unnamed composite: "letter_a" in errors
val unnamedComposite = parserA * parserB
val result1 = namedComposite.parseAll("c")
val exception1 = result1.exceptionOrNull() as? ParseException
val names1 = exception1?.context?.suggestedParsers?.mapNotNull { it.name } ?: emptyList()
check(names1.contains("ab_sequence"))
val result2 = unnamedComposite.parseAll("c")
val exception2 = result2.exceptionOrNull() as? ParseException
val names2 = exception2?.context?.suggestedParsers?.mapNotNull { it.name } ?: emptyList()
check(names2.contains("letter_a"))
}
Best practice: Name composite parsers for semantic errors (“Expected: identifier”) and leave components unnamed for detailed token-level errors during development.
Hidden Parsers
Sometimes parsers need to be tracked internally but shouldn’t clutter error suggestions. Use .hidden for parsers like whitespace that can appear anywhere:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
fun main() {
val whitespace = (+Regex("\\s+")).hidden
val number = +Regex("[0-9]+") named "number" map { it.value.toInt() }
val operator = (+'*' + +'+') named "operator"
// Parser that optionally accepts whitespace
val expr = number * whitespace.optional * operator * whitespace.optional * number
val result = expr.parseAll("42abc") // Fails: expected operator or number
val exception = result.exceptionOrNull() as? ParseException
check(exception != null)
val suggestions = exception.context.suggestedParsers.orEmpty().mapNotNull { it.name?.ifEmpty { null } }
// Contains meaningful parsers but not hidden whitespace
check(suggestions.contains("operator") || suggestions.contains("number"))
check(!suggestions.contains(""))
}
.hidden is equivalent to named("") - it sets the parser name to an empty string, which excludes it from error suggestions while still tracking it internally.
Use case: Apply to parsers that can appear anywhere (whitespace, comments) to keep error messages focused on meaningful tokens.
Key Takeaways
+for alternatives (first match wins).optionalrewinds on failure, returnsTuple1<T?>.zeroOrMore/.oneOrMorecollect matches into lists-parserdrops values from tuples- Destructuring in
maptransforms tuple results startOfInput/endOfInputmatch boundariesnamedimproves error messages.hiddenexcludes parsers from error suggestions
Next Steps
Learn how to handle recursive grammars and operator precedence.