Step 6: Template Strings

Learn how PEG-style parsers naturally handle template strings with embedded expressions—without tokenization.

Why No Tokenizer?

Traditional parsers with separate lexer/tokenizer phases struggle with template strings like "hello $(1+2) world":

PEG parsers working character-by-character naturally handle context switches without designing complicated token rules.

Complete Template String Parser

Here’s a parser for template strings with embedded arithmetic expressions:

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

sealed class TemplateElement
data class StringPart(val text: String) : TemplateElement()
data class ExpressionPart(val value: Int) : TemplateElement()

val templateStringParser: Parser<String> = object {
    // Expression parser (arithmetic with precedence)
    val number = +Regex("[0-9]+") map { it.value.toInt() } named "number"
    val grouped: Parser<Int> = -'(' * ref { sum } * -')'
    val factor: Parser<Int> = number + grouped
    val product = leftAssociative(factor, -'*') { a, _, b -> a * b }
    val sum: Parser<Int> = leftAssociative(product, -'+') { a, _, b -> a + b }
    val expression = sum

    // String parts: match everything except $( and closing "
    val stringPart: Parser<TemplateElement> =
        +Regex("""[^"$]+|\$(?!\()""") map { match ->
            StringPart(match.value)
        } named "string_part"

    // Expression part: $(...)
    val expressionPart: Parser<TemplateElement> =
        -+"$(" * expression * -')' map { value ->
            ExpressionPart(value)
        }

    // Template elements can be string parts or expression parts
    val templateElement = expressionPart + stringPart

    // Complete template string: "..." with any number of elements
    val templateString: Parser<String> =
        -'"' * templateElement.zeroOrMore * -'"' map { elements ->
            elements.joinToString("") { element ->
                when (element) {
                    is StringPart -> element.text
                    is ExpressionPart -> element.value.toString()
                }
            }
        }

    val root = templateString
}.root

fun main() {
    check(templateStringParser.parseAll(""""hello"""").getOrThrow() == "hello")
    check(templateStringParser.parseAll(""""result: $(1+2)"""").getOrThrow() == "result: 3")
    check(templateStringParser.parseAll(""""$(2*(3+4)) = answer"""").getOrThrow() == "14 = answer")
    check(templateStringParser.parseAll(""""a$(1)b$(2)c$(3)d"""").getOrThrow() == "a1b2c3d")
}

Note: In Kotlin string literals, """"hello"""" represents the input "hello" because inner quotes must be escaped.

How It Works

The Key: Smart Regex Boundaries

The pattern +Regex("""[^"$]+|\$(?!\()""") matches:

The regex naturally stops at template boundaries ($() without explicit tokenization. When $( is encountered, control passes to expressionPart, which recursively invokes the expression parser.

Context Switching

The choice combinator val templateElement = expressionPart + stringPart handles context switching.

Try to parse an expression first. If that fails (no $( found), parse a string part. This naturally alternates between contexts as needed.

Recursion

The grouped parser uses ref { sum } (example: val grouped: Parser<Int> = -'(' * ref { sum } * -')') to allow parenthesized sub-expressions.

This enables nested expressions like $(2*(3+4)).

Nested Template Strings

Extend the pattern to handle strings inside expressions:

import io.github.mirrgieriana.xarpeg.*
import io.github.mirrgieriana.xarpeg.parsers.*

sealed class TemplateElement
data class StringPart(val text: String) : TemplateElement()
data class ExpressionPart(val value: Int) : TemplateElement()

object TemplateWithNestedStrings {
    val number = +Regex("[0-9]+") map { it.value.toInt() } named "number"
    val grouped: Parser<Int> = -'(' * ref { sum } * -')'

    val stringPart: Parser<TemplateElement> =
        +Regex("""[^"$]+|\$(?!\()""") map { match -> StringPart(match.value) } named "string_part"

    val expressionPart: Parser<TemplateElement> =
        -+"$(" * ref { sum } * -')' map { value ->
            ExpressionPart(value)
        }

    val templateElement = expressionPart + stringPart

    val templateString: Parser<String> = ref {
        -'"' * templateElement.zeroOrMore * -'"' map { elements ->
            elements.joinToString("") { element ->
                when (element) {
                    is StringPart -> element.text
                    is ExpressionPart -> element.value.toString()
                }
            }
        }
    }

    // Expressions can now contain template strings
    val factor: Parser<Int> = number + grouped + (templateString map { it.length })
    val sum: Parser<Int> = leftAssociative(factor, -'+') { a, _, b -> a + b }
}

fun main() {
    check(TemplateWithNestedStrings.templateString.parseAll("\"nested $(1+2)\"").getOrThrow() == "nested 3")
}

The expression parser can recursively call the template string parser, and vice versa. This mutual recursion works naturally with PEG—no pre-tokenization complexity.

Benefits Recap

Natural context switching: The parser adapts based on what it has seen, not predetermined token boundaries.

Simpler grammar: No complex token rules that must handle all possible contexts upfront.

Recursive embedding: Expressions can contain strings, strings can contain expressions—without special cases.

Regex-based boundaries: Use negative lookahead and character classes to define natural stopping points.

Extending Further

This approach scales well to more complex scenarios:

Multiple delimiters - Support both $(...) and #{...} by creating multiple expression part parsers and combining them.

Escape sequences - Match \$( as literal text using patterns like +Regex("""(?:[^"$\\]|\\.)+|\$(?!\()""").

Different quote styles - Support single and double quotes by creating parsers for each style and combining with choice.

Each addition is a localized change to the relevant parser, not a redesign of the entire token vocabulary.

Key Takeaways

Congratulations!

You’ve completed the Xarpeg tutorial! You now know how to:

Next Steps

Back to Tutorial Index