Step 4: Runtime Behavior
Understand how parsers handle errors, consume input, and control caching for optimal performance.
Parsing Methods
parseAll(...).getOrThrow()
Requires the entire input to be consumed:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val number = (+Regex("[0-9]+")).value map { it.toInt() } named "number"
fun main() {
number.parseAll("123").getOrThrow() // ✓ Returns 123
// number.parseAll("123abc").getOrThrow() // ✗ ParseException
// number.parseAll("abc").getOrThrow() // ✗ ParseException
}
Exception Types
ParseException- Thrown when no parser matches at the current position or when parsing succeeds but trailing input remains
This exception provides a context property for detailed error information.
Error Context
ParseContext tracks parsing failures to help build user-friendly error messages:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val letter = (+Regex("[a-z]")).value named "letter"
val digit = (+Regex("[0-9]")).value named "digit"
val identifier = letter * (letter + digit).zeroOrMore
fun main() {
val result = identifier.parseAll("1abc")
val exception = result.exceptionOrNull() as? ParseException
check(exception != null) // Parsing fails
check((exception.context.errorPosition ?: 0) == 0) // Failed at position 0
val expected = exception.context.suggestedParsers.orEmpty()
.mapNotNull { it.name }
.distinct()
.sorted()
.joinToString(", ")
check(expected == "letter") // Expected "letter"
}
Error Tracking Properties
errorPosition- Furthest position attempted during parsingsuggestedParsers- Set of parsers that failed aterrorPosition
As parsing proceeds:
- When a parser fails further than
errorPosition, it updates andsuggestedParsersclears - Parsers failing at the current
errorPositionare added tosuggestedParsers - Named parsers appear using their assigned names
Using Error Context with Exceptions
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val number = (+Regex("[0-9]+")).value map { it.toInt() } named "number"
val operator = (+'*' + +'+') named "operator"
val expr = number * operator * number
fun main() {
val result = expr.parseAll("42 + 10")
val exception = result.exceptionOrNull() as? ParseException
check(exception != null) // Parsing fails
check((exception.context.errorPosition ?: 0) > 0) // Error position tracked
val suggestions = exception.context.suggestedParsers.orEmpty().mapNotNull { it.name }
check(suggestions.isNotEmpty()) // Has suggestions
}
Rich Error Messages
Use the formatMessage extension function to generate user-friendly error messages that include error position, expected elements, the source line, and a caret indicator pointing to the error location:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val number = (+Regex("[0-9]+")).value map { it.toInt() } named "number"
val operator = +'+' + +'-'
val expr = number * operator * number
fun main() {
val input = "42*10"
try {
expr.parseAll(input).getOrThrow()
} catch (exception: ParseException) {
val message = exception.formatMessage()
val lines = message.lines()
check(lines[0] == "Syntax Error at 1:3")
check(lines[1] == "Expect: \"+\", \"-\"")
check(lines[2] == "Actual: \"*\"")
check(lines[3] == "42*10")
check(lines[4] == " ^")
}
}
The formatMessage function provides:
- Error line and column number
- List of expected named parsers (if available)
- The actual character found (or EOF)
- The source line where the error occurred
- A caret (
^) symbol indicating the error position
Memoization and Caching
Default Behavior
DefaultParseContext uses memoization by default to make backtracking predictable:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val parser = (+Regex("[a-z]+")).value named "word"
fun main() {
// Memoization enabled (default)
parser.parseAll("hello").getOrThrow()
}
Each (parser, position) pair is memoized, so repeated attempts at the same position return memoized results.
Disabling Memoization
Disable memoization for lower memory usage when your grammar doesn’t backtrack heavily:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val parser = (+Regex("[a-z]+")).value named "word"
fun main() {
parser.parseAll("hello") { s -> DefaultParseContext(s).also { it.useMemoization = false } }.getOrThrow()
}
Trade-offs:
- Memoization enabled - Higher memory, predictable performance with heavy backtracking
- Memoization disabled - Lower memory, potential performance issues with alternatives
State-Dependent Memoization
DefaultParseContext can be subclassed with custom mutable state. If that state affects parsing results, override getState() to partition the memo table by state. This prevents cached results from one state being reused when the state changes:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
class IndentAwareContext(src: String) : DefaultParseContext(src) {
var indentLevel: Int = 0
override fun getState(): Any = indentLevel
}
fun main() {
val parser = +"hello"
val context = IndentAwareContext("hello")
context.indentLevel = 0
check(context.parseOrNull(parser, 0) != null) // Cached under indentLevel=0
context.indentLevel = 1
check(context.parseOrNull(parser, 0) != null) // Re-evaluated under indentLevel=1
}
The default getState() returns Unit, so all results share a single memo table — equivalent to standard memoization. The returned value is used as a Map key, so it must implement equals and hashCode consistently.
Error Propagation
If a map function throws an exception, it bubbles up and aborts parsing:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val divisionByZero = (+Regex("[0-9]+")).value map { value ->
val n = value.toInt()
if (n == 0) error("Cannot divide by zero")
100 / n
} named "number"
fun main() {
divisionByZero.parseAll("10").getOrThrow() // ✓ Returns 10
// divisionByZero.parseAll("0").getOrThrow() // ✗ IllegalStateException
}
Validate before mapping or catch and wrap errors when recovery is needed.
Debugging Tips
Inspect Error Details from Result
Access error context from parse result:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val parser = +Regex("[a-z]+") named "word"
fun main() {
val result = parser.parseAll("123")
val exception = result.exceptionOrNull() as? ParseException
check(exception != null) // Parsing fails
check((exception.context.errorPosition ?: 0) == 0) // Error at position 0
check(exception.context.suggestedParsers?.any { it.name == "word" } == true) // Suggests "word"
}
Check Rewind Behavior
Confirm how optional and zeroOrMore rewind on failure:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
val parser = (+Regex("[a-z]+") named "letters").optional * +Regex("[0-9]+") named "digits"
fun main() {
// optional fails but rewinds, allowing number parser to succeed
val result = parser.parseAll("123").getOrThrow()
check(result != null) // Succeeds
}
Use Tests as Reference
Check the test suite for observed behavior:
- ErrorContextTest.kt - Error tracking examples
- ParserTest.kt - Comprehensive behavior tests
- MemoizationStateTest.kt - State-dependent memoization tests
Extending ParseContext
ParseContext is an interface. Its default implementation, DefaultParseContext, is declared as open class, allowing you to extend it with custom state for specialized parsing needs.
Example: Indent-Based Language Support
You can extend DefaultParseContext to track indentation levels for Python-style languages:
import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*
fun main() {
class IndentParseContext(
src: String,
) : DefaultParseContext(src) {
private val indentStack = mutableListOf(0)
val currentIndent: Int get() = indentStack.last()
val isInIndentBlock: Boolean get() = indentStack.size > 1
fun pushIndent(indent: Int) {
require(indent > currentIndent)
indentStack.add(indent)
}
fun popIndent() {
require(indentStack.size > 1)
indentStack.removeLast()
}
// Required: return a snapshot of mutable state so that memoization
// uses a separate cache table for each distinct indentation state.
override fun getState(): Any = indentStack.toList()
}
val ctx = IndentParseContext("source")
check(ctx.currentIndent == 0)
check(!ctx.isInIndentBlock)
ctx.pushIndent(4)
check(ctx.currentIndent == 4)
check(ctx.isInIndentBlock)
ctx.popIndent()
check(ctx.currentIndent == 0)
check(!ctx.isInIndentBlock)
}
See the online-parser sample source code for a complete implementation.
Key Takeaways
parseAll(...).getOrThrow()requires full consumption, throws on failure- Error context provides
errorPositionandsuggestedParsers - Named parsers appear in error messages with their assigned names
- Memoization is enabled by default; disable by providing a custom context factory. Override
getState()in subclasses for state-dependent memoization - Exceptions in
mapbubble up and abort parsing parseOrNullwithParseContextenables detailed debuggingDefaultParseContextis extensible for custom parsing requirements
Next Steps
Learn how to extract position information for error reporting and source mapping.