Skip to content

RFC 016 - Break-After Setting

This RFC proposes a post-parse transformer that automatically inserts SegmentBreak nodes after configurable Unicode characters. The primary motivation is supporting the Devanagari Danda () and Double Danda () as natural line-break points in Sanskrit verse content — without violating the Vyasa grammar rule that all commands begin with a backtick (`).

Sanskrit verses (ślokas) are traditionally structured around Danda punctuation:

धृतराष्ट्र उवाच ।
धर्मक्षेत्रे कुरुक्षेत्रे समवेता युयुत्सवः ।
मामकाः पाण्डवाश्चैव किमकुर्वत सञ्जय ॥
  • (U+0964, Devanagari Danda) — marks the end of a half-verse (pāda)
  • (U+0965, Devanagari Double Danda) — marks the end of a full verse

Currently, authors must use the pipe | character to create line breaks, which is foreign to the script and requires manual intervention. Ideally, authors should be able to write natural Devanagari text, and the compiler should understand that these punctuation marks imply line breaks.

The Vyasa grammar maintains a clean rule: all commands begin with `. The pipe | is the sole exception as a structural break character. Adding more special characters to the parser would erode this consistency. Instead, we handle this at the transformer level — after parsing, before projection.

A new break_after setting accepts a string of characters. Each character in the string will trigger a SegmentBreak node to be inserted after it in text content.

'set settings { break_after = "।॥" }

This is typically placed in the workspace’s root context.vy or a folder-level context.vy.

A post-parse pass scans all Text nodes. When a character from break_after is found:

  1. The text is split at that character
  2. The character itself is preserved in the preceding text fragment
  3. A SegmentBreak node is inserted after it

Example: Given break_after = "।॥" and input text:

धर्मक्षेत्रे कुरुक्षेत्रे समवेता युयुत्सवः । मामकाः पाण्डवाश्चैव किमकुर्वत सञ्जय ॥

The transformer produces:

Text("धर्मक्षेत्रे कुरुक्षेत्रे समवेता युयुत्सवः । ")
SegmentBreak
Text("मामकाः पाण्डवाश्चैव किमकुर्वत सञ्जय ॥")
SegmentBreak

Backends render SegmentBreak as they already do:

  • HTML: <br />
  • JSON: Preserved as a structural node
  • SQLite: Segment boundary marker

The Danda characters appear in the output text as normal punctuation. The SegmentBreak simply adds the visual line break.

The setting is intentionally script-neutral. Examples for other traditions:

ScriptCharactersUnicodeUse Case
Devanagari U+0964, U+0965Sanskrit/Hindi verse endings
CJKU+3002Chinese/Japanese sentence endings
TibetanU+0F0DTibetan shad (verse marker)
ThaiU+0E2FThai abbreviation/pause mark
  1. config.rs — Parse the break_after setting from 'set settings { ... } and store it in VyasaEnvironment
  2. models.rs — Add break_after: Option<String> to VyasaEnvironment
  3. transformer.rs (or new module) — Implement the text-splitting pass that runs after parsing and config processing
  4. Pipeline integration — Insert the transformer into the compilation pipeline between parsing and projection
Parse → Config (set commands) → Break-After Transform → Projection → Backend

The transformer must run after config processing (so break_after is available) and before projection (so templates see the SegmentBreak nodes).

  1. Consecutive break characters: ॥। → Two breaks emitted, one after each character
  2. Break character at end of text: Still emits a SegmentBreak after it (the backend handles trailing breaks gracefully)
  3. Break character inside commands: Only Text nodes are scanned; command names, attributes, and arguments are untouched
  4. Interaction with |: The pipe remains a pure structural break (consumed, invisible in output). break_after characters are preserved in output. Both produce SegmentBreak nodes

Draft — Approved for implementation.