Step 4: Runtime Behavior

Understand how parsers handle errors, consume input, and control caching for optimal performance.

Parsing Methods

`parseAll(...).getOrThrow()`

Requires the entire input to be consumed:

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

val number = (+Regex("[0-9]+")).value map { it.toInt() } named "number"

fun main() {
    number.parseAll("123").getOrThrow()      // ✓ Returns 123
    // number.parseAll("123abc").getOrThrow() // ✗ ParseException
    // number.parseAll("abc").getOrThrow()    // ✗ ParseException
}

Exception Types

ParseException - Thrown when no parser matches at the current position or when parsing succeeds but trailing input remains

This exception provides a context property for detailed error information.

Error Context

ParseContext tracks parsing failures to help build user-friendly error messages:

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

val letter = (+Regex("[a-z]")).value named "letter"
val digit = (+Regex("[0-9]")).value named "digit"
val identifier = letter * (letter + digit).zeroOrMore

fun main() {
    val result = identifier.parseAll("1abc")
    val exception = result.exceptionOrNull() as? ParseException

    check(exception != null)  // Parsing fails
    check((exception.context.errorPosition ?: 0) == 0)  // Failed at position 0

    val expected = exception.context.suggestedParsers.orEmpty()
        .mapNotNull { it.name }
        .distinct()
        .sorted()
        .joinToString(", ")

    check(expected == "letter")  // Expected "letter"
}

Error Tracking Properties

errorPosition - Furthest position attempted during parsing
suggestedParsers - Set of parsers that failed at errorPosition

As parsing proceeds:

When a parser fails further than errorPosition, it updates and suggestedParsers clears
Parsers failing at the current errorPosition are added to suggestedParsers
Named parsers appear using their assigned names

Using Error Context with Exceptions

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

val number = (+Regex("[0-9]+")).value map { it.toInt() } named "number"
val operator = (+'*' + +'+') named "operator"
val expr = number * operator * number

fun main() {
    val result = expr.parseAll("42 + 10")
    val exception = result.exceptionOrNull() as? ParseException

    check(exception != null)  // Parsing fails
    check((exception.context.errorPosition ?: 0) > 0)  // Error position tracked
    val suggestions = exception.context.suggestedParsers.orEmpty().mapNotNull { it.name }
    check(suggestions.isNotEmpty())  // Has suggestions
}

Rich Error Messages

Use the formatMessage extension function to generate user-friendly error messages that include error position, expected elements, the source line, and a caret indicator pointing to the error location:

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

val number = (+Regex("[0-9]+")).value map { it.toInt() } named "number"
val operator = +'+' + +'-'
val expr = number * operator * number

fun main() {
    val input = "42*10"
    try {
        expr.parseAll(input).getOrThrow()
    } catch (exception: ParseException) {
        val message = exception.formatMessage()
        val lines = message.lines()
        check(lines[0] == "Syntax Error at 1:3")
        check(lines[1] == "Expect: \"+\", \"-\"")
        check(lines[2] == "Actual: \"*\"")
        check(lines[3] == "42*10")
        check(lines[4] == "  ^")
    }
}

The formatMessage function provides:

Error line and column number
List of expected named parsers (if available)
The actual character found (or EOF)
The source line where the error occurred
A caret (^) symbol indicating the error position

Memoization and Caching

Default Behavior

DefaultParseContext uses memoization by default to make backtracking predictable:

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

val parser = (+Regex("[a-z]+")).value named "word"

fun main() {
    // Memoization enabled (default)
    parser.parseAll("hello").getOrThrow()
}

Each (parser, position) pair is memoized, so repeated attempts at the same position return memoized results.

Disabling Memoization

Disable memoization for lower memory usage when your grammar doesn’t backtrack heavily:

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

val parser = (+Regex("[a-z]+")).value named "word"

fun main() {
    parser.parseAll("hello") { s -> DefaultParseContext(s).also { it.useMemoization = false } }.getOrThrow()
}

Trade-offs:

Memoization enabled - Higher memory, predictable performance with heavy backtracking
Memoization disabled - Lower memory, potential performance issues with alternatives

State-Dependent Memoization

DefaultParseContext can be subclassed with custom mutable state. If that state affects parsing results, override getState() to partition the memo table by state. This prevents cached results from one state being reused when the state changes:

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

class IndentAwareContext(src: String) : DefaultParseContext(src) {
    var indentLevel: Int = 0
    override fun getState(): Any = indentLevel
}

fun main() {
    val parser = +"hello"
    val context = IndentAwareContext("hello")

    context.indentLevel = 0
    check(context.parseOrNull(parser, 0) != null)  // Cached under indentLevel=0

    context.indentLevel = 1
    check(context.parseOrNull(parser, 0) != null)  // Re-evaluated under indentLevel=1
}

The default getState() returns Unit, so all results share a single memo table — equivalent to standard memoization. The returned value is used as a Map key, so it must implement equals and hashCode consistently.

Error Propagation

If a map function throws an exception, it bubbles up and aborts parsing:

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

val divisionByZero = (+Regex("[0-9]+")).value map { value ->
    val n = value.toInt()
    if (n == 0) error("Cannot divide by zero")
    100 / n
} named "number"

fun main() {
    divisionByZero.parseAll("10").getOrThrow()  // ✓ Returns 10
    // divisionByZero.parseAll("0").getOrThrow()  // ✗ IllegalStateException
}

Validate before mapping or catch and wrap errors when recovery is needed.

Debugging Tips

Inspect Error Details from Result

Access error context from parse result:

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

val parser = +Regex("[a-z]+") named "word"

fun main() {
    val result = parser.parseAll("123")
    val exception = result.exceptionOrNull() as? ParseException

    check(exception != null)  // Parsing fails
    check((exception.context.errorPosition ?: 0) == 0)  // Error at position 0
    check(exception.context.suggestedParsers?.any { it.name == "word" } == true)  // Suggests "word"
}

Check Rewind Behavior

Confirm how optional and zeroOrMore rewind on failure:

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

val parser = (+Regex("[a-z]+") named "letters").optional * +Regex("[0-9]+") named "digits"

fun main() {
    // optional fails but rewinds, allowing number parser to succeed
    val result = parser.parseAll("123").getOrThrow()
    check(result != null)  // Succeeds
}

Use Tests as Reference

Check the test suite for observed behavior:

ErrorContextTest.kt - Error tracking examples
ParserTest.kt - Comprehensive behavior tests
MemoizationStateTest.kt - State-dependent memoization tests

Extending ParseContext

ParseContext is an interface. Its default implementation, DefaultParseContext, is declared as open class, allowing you to extend it with custom state for specialized parsing needs.

Example: Indent-Based Language Support

You can extend DefaultParseContext to track indentation levels for Python-style languages:

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

fun main() {
    class IndentParseContext(
        src: String,
    ) : DefaultParseContext(src) {
        private val indentStack = mutableListOf(0)

        val currentIndent: Int get() = indentStack.last()
        val isInIndentBlock: Boolean get() = indentStack.size > 1

        fun pushIndent(indent: Int) {
            require(indent > currentIndent)
            indentStack.add(indent)
        }

        fun popIndent() {
            require(indentStack.size > 1)
            indentStack.removeLast()
        }

        // Required: return a snapshot of mutable state so that memoization
        // uses a separate cache table for each distinct indentation state.
        override fun getState(): Any = indentStack.toList()
    }

    val ctx = IndentParseContext("source")
    check(ctx.currentIndent == 0)
    check(!ctx.isInIndentBlock)

    ctx.pushIndent(4)
    check(ctx.currentIndent == 4)
    check(ctx.isInIndentBlock)

    ctx.popIndent()
    check(ctx.currentIndent == 0)
    check(!ctx.isInIndentBlock)
}

See the online-parser sample source code for a complete implementation.

Key Takeaways

parseAll(...).getOrThrow() requires full consumption, throws on failure
Error context provides errorPosition and suggestedParsers
Named parsers appear in error messages with their assigned names
Memoization is enabled by default; disable by providing a custom context factory. Override getState() in subclasses for state-dependent memoization
Exceptions in map bubble up and abort parsing
parseOrNull with ParseContext enables detailed debugging
DefaultParseContext is extensible for custom parsing requirements

Next Steps

Learn how to extract position information for error reporting and source mapping.

→ Step 5: Parsing Positions

Xarpeg: Kotlin PEG Parser

Lightweight PEG-style parser combinators for Kotlin Multiplatform