RFC 016 - Break-After Setting
RFC 016: Break-After Setting
Section titled “RFC 016: Break-After Setting”Summary
Section titled “Summary”This RFC proposes a post-parse transformer that automatically inserts SegmentBreak nodes after configurable Unicode characters. The primary motivation is supporting the Devanagari Danda (।) and Double Danda (॥) as natural line-break points in Sanskrit verse content — without violating the Vyasa grammar rule that all commands begin with a backtick (`).
Motivation
Section titled “Motivation”The Problem
Section titled “The Problem”Sanskrit verses (ślokas) are traditionally structured around Danda punctuation:
धृतराष्ट्र उवाच ।धर्मक्षेत्रे कुरुक्षेत्रे समवेता युयुत्सवः ।मामकाः पाण्डवाश्चैव किमकुर्वत सञ्जय ॥।(U+0964, Devanagari Danda) — marks the end of a half-verse (pāda)॥(U+0965, Devanagari Double Danda) — marks the end of a full verse
Currently, authors must use the pipe | character to create line breaks, which is foreign to the script and requires manual intervention. Ideally, authors should be able to write natural Devanagari text, and the compiler should understand that these punctuation marks imply line breaks.
Design Constraint
Section titled “Design Constraint”The Vyasa grammar maintains a clean rule: all commands begin with `. The pipe | is the sole exception as a structural break character. Adding more special characters to the parser would erode this consistency. Instead, we handle this at the transformer level — after parsing, before projection.
Proposed Solution
Section titled “Proposed Solution”Configuration
Section titled “Configuration”A new break_after setting accepts a string of characters. Each character in the string will trigger a SegmentBreak node to be inserted after it in text content.
'set settings { break_after = "।॥" }This is typically placed in the workspace’s root context.vy or a folder-level context.vy.
Transformer Behavior
Section titled “Transformer Behavior”A post-parse pass scans all Text nodes. When a character from break_after is found:
- The text is split at that character
- The character itself is preserved in the preceding text fragment
- A
SegmentBreaknode is inserted after it
Example: Given break_after = "।॥" and input text:
धर्मक्षेत्रे कुरुक्षेत्रे समवेता युयुत्सवः । मामकाः पाण्डवाश्चैव किमकुर्वत सञ्जय ॥The transformer produces:
Text("धर्मक्षेत्रे कुरुक्षेत्रे समवेता युयुत्सवः । ")SegmentBreakText("मामकाः पाण्डवाश्चैव किमकुर्वत सञ्जय ॥")SegmentBreakRendering
Section titled “Rendering”Backends render SegmentBreak as they already do:
- HTML:
<br /> - JSON: Preserved as a structural node
- SQLite: Segment boundary marker
The Danda characters appear in the output text as normal punctuation. The SegmentBreak simply adds the visual line break.
Applicability Beyond Sanskrit
Section titled “Applicability Beyond Sanskrit”The setting is intentionally script-neutral. Examples for other traditions:
| Script | Characters | Unicode | Use Case |
|---|---|---|---|
| Devanagari | । ॥ | U+0964, U+0965 | Sanskrit/Hindi verse endings |
| CJK | 。 | U+3002 | Chinese/Japanese sentence endings |
| Tibetan | ། | U+0F0D | Tibetan shad (verse marker) |
| Thai | ฯ | U+0E2F | Thai abbreviation/pause mark |
Implementation Plan
Section titled “Implementation Plan”Files to Modify
Section titled “Files to Modify”config.rs— Parse thebreak_aftersetting from'set settings { ... }and store it inVyasaEnvironmentmodels.rs— Addbreak_after: Option<String>toVyasaEnvironmenttransformer.rs(or new module) — Implement the text-splitting pass that runs after parsing and config processing- Pipeline integration — Insert the transformer into the compilation pipeline between parsing and projection
Execution Order
Section titled “Execution Order”Parse → Config (set commands) → Break-After Transform → Projection → BackendThe transformer must run after config processing (so break_after is available) and before projection (so templates see the SegmentBreak nodes).
Edge Cases
Section titled “Edge Cases”- Consecutive break characters:
॥।→ Two breaks emitted, one after each character - Break character at end of text: Still emits a
SegmentBreakafter it (the backend handles trailing breaks gracefully) - Break character inside commands: Only
Textnodes are scanned; command names, attributes, and arguments are untouched - Interaction with
|: The pipe remains a pure structural break (consumed, invisible in output).break_aftercharacters are preserved in output. Both produceSegmentBreaknodes
Status
Section titled “Status”Draft — Approved for implementation.