#+TITLE: Markdown + Code + Diff Rendering (v0.8.0) #+DATE: 2026-05-11 #+AUTHOR: Amr Gharbeia / Hermes * Overview This module provides rendering of Markdown text, syntax-highlighted code blocks, and unified diffs in the terminal. It completes the rendering pipeline so that [[file:render.org][the render tree]] can handle rich formatted content. The Markdown renderer is /not/ a general-purpose MD-to-HTML converter. It targets TUI output: node types that have clear terminal analogues (headings → bold/bright, code blocks → monochrome block, bold → ANSI bold, etc.). Edge cases that matter for a terminal (long lines, escape sequences inside code, mixed formatting) are handled explicitly. ** Design decisions 1. /Two-phase parse/: block-level first (lines), then inline (characters within each block). This matches how terminals render — block layout first, style within. 2. /Syntax highlighting by keyword set/: not a full lexer. A lookup table of language → (keywords, types, builtins) sets. Catches ~90% of highlighting cases without pulling in a parser. Fails safe (unmatched tokens render as plain text). 3. /Diff lines are self-describing/: a diff block starts with ─── or +++, each line has a ± prefix. We don't re-parse patch semantics; we just color by prefix. This makes the renderer tolerant of malformed diffs. 4. /No recursive descent parser/: a simple state machine over lines for block-level, and a character cursor for inline. Keeps the code short and avoids parser-generator dependencies. * Code structure ** Node types We represent the parsed document as a tree of plists. Each node has at least a `:type` key. Block-level nodes carry a `:children` list of inline nodes. This keeps the data structure simple — no class hierarchy, no generic dispatch — while being easy to traverse for rendering. Node types: | Block-level | Inline | |------------------+--------------------| | `:heading` | `:text` | | `:paragraph` | `:bold` | | `:code-block` | `:italic` | | `:blockquote` | `:inline-code` | | `:list-item` | `:link` | | `:ordered-item` | | | `:thematic-break`| | | `:diff-block` | | --- per-function: markdown-node-make ~make-md-node~ is a convenience constructor for node plists. It ensures `:children` defaults to NIL (not an empty list) so renderers can check `(if children ...)` without testing `(when children ...)` vs `(if (null children) ...)`. #+BEGIN_SRC lisp :tangle no (defun make-md-node (type &key children properties) "Create a markdown node plist. TYPE is a keyword like :heading or :bold. CHILDREN is a list of inline node plists (or NIL). PROPERTIES is a plist of node-specific extra keys (e.g. :level for headings)." (let ((node (list :type type))) (when children (setf (getf node :children) children)) (when properties (setf (getf node :properties) properties)) node)) #+END_SRC --- per-function: markdown-node-p ~md-node-p~ checks whether something is a markdown node plist. We just look for a :type key. This is used in tests and as a guard in recursive renderers. #+BEGIN_SRC lisp :tangle no (defun md-node-p (thing) "Return T if THING is a markdown node (has a :type key)." (and (listp thing) (getf thing :type))) #+END_SRC --- per-function: markdown-node-text ~md-node-text~ extracts the plain text from a node tree by concatenating all :text children recursively, discarding markup. This is useful for things like heading anchors, tooltip strings, or search indexing. #+BEGIN_SRC lisp :tangle no (defun md-node-text (node) "Recursively extract plain text from a markdown node tree." (let ((type (getf node :type))) (cond ((eql type :text) (or (getf node :content) "")) ((eql type :link) (concatenate 'string (md-node-text (first (getf node :children))) (format nil " (~a)" (or (getf node :url) "")))) ((getf node :children) (apply #'concatenate 'string (mapcar #'md-node-text (getf node :children)))) (t "")))) #+END_SRC ** Block-level parser The block parser operates line-by-line with a simple state machine. Each line is classified by its prefix characters, then accumulated into a node. Rules: - Lines starting with `#` → heading (count hashes for level) - Lines starting with `>` → blockquote (continuation lines merge) - Lines starting with `-`, `*`, or `+` → list-item - Lines starting with 1-3 digits followed by `.` → ordered-item - Lines starting with `` ``` `` → code-block (language on opening line) - Lines starting with `---` or `***` → thematic-break - Lines starting with `--- ` or `+++ ` → diff-block - Empty lines → paragraph boundary - Everything else → paragraph (continuation lines merge until blank) --- per-function: classify-line ~classify-line~ returns a keyword and a data value for a trimmed line of text. The state machine uses this to decide what kind of block to create or continue. The function must handle prefix stripping (e.g. remove `# ` after counting hashes) and edge cases like `#` inside a code block (which we don't classify at all — the code block state machine handles that). One trap: a line like `#not-a-heading` (no space after hash) is NOT a heading in CommonMark. We check for space/tab after the hashes. Another trap: `* item` in a list vs `**bold**` inline. At the block-parser level we only look at /line-start/ `* ` (star + space) for list items. A line starting with `** text` could be either a nested list item or bold text in a paragraph — we conservatively treat it as a list-item (the inline parser will handle ** inside paragraphs normally). #+BEGIN_SRC lisp :tangle no (defun classify-line (line) "Classify a trimmed LINE, returning (type . data). TYPE is a keyword; DATA is language for code-blocks, level for headings, etc." (cond ;; Empty line ((string= line "") (cons :blank nil)) ;; Thematic break: --- or *** (3+ chars, all same, optional whitespace) ((and (>= (length line) 3) (every (lambda (c) (or (char= c (char line 0)) (char= c #\Space) (char= c #\Tab))) line) (find (char line 0) "-*")) (cons :thematic-break nil)) ;; Heading: #+, with space after hashes ((and (char= (char line 0) #\#) (let ((count 0)) (loop for c across line while (char= c #\#) do (incf count)) (and (<= 1 count 6) (or (>= (length line) (1+ count)) (member (char line count) '(#\Space #\Tab)))))) (let* ((hash-count (loop for c across line while (char= c #\#) count c)) (content (string-trim (list #\Space #\Tab) (subseq line hash-count)))) (cons :heading (cons hash-count content)))) ;; Blockquote: > ((and (>= (length line) 1) (char= (char line 0) #\>)) (let ((content (string-trim (list #\Space #\Tab) (subseq line 1)))) (cons :blockquote content))) ;; Unordered list: -, *, + ((and (>= (length line) 2) (find (char line 0) "-*+") (char= (char line 1) #\Space)) (cons :list-item (string-trim (list #\Space #\Tab) (subseq line 2)))) ;; Ordered list: N. or N) ((and (>= (length line) 3) (digit-char-p (char line 0)) (loop for c across line while (digit-char-p c) finally (return (find c '(#\. #\) #\Space))))) (let ((dot-pos (position-if (lambda (c) (find c ". )")) line))) (if (and dot-pos (find (char line dot-pos) ". )")) (cons :ordered-item (string-trim (list #\Space #\Tab) (subseq line (1+ dot-pos)))) (cons :paragraph line)))) ;; Diff: --- file or +++ file ((and (>= (length line) 4) (find (char line 0) "-+") (char= (char line 1) (char line 0)) (char= (char line 2) (char line 0)) (char= (char line 3) #\Space)) (cons :diff-header line)) ;; Diff: line content with +/- prefix ((and (>= (length line) 1) (find (char line 0) "-+") (not (and (>= (length line) 3) (char= (char line 1) (char line 0)) (char= (char line 2) (char line 0))))) (cons :diff-line (cons (char line 0) (subseq line 1)))) ;; Fenced code block start: ``` or ~~~ ((and (>= (length line) 3) (find (char line 0) "`~") (every (lambda (c) (char= c (char line 0))) (subseq line 0 (min 6 (length line)))) (let ((rest (string-trim (list #\Space #\Tab) (subseq line (min 6 (length line)))))) (cons :code-start rest)))) ;; Default: paragraph content (t (cons :paragraph line)))) #+END_SRC --- per-function: parse-blocks ~parse-blocks~ is the main block-level parser. It takes a string (possibly multi-line) and returns a list of markdown node plists. The algorithm: 1. Split into lines 2. Classify each line 3. Accumulate lines of the same type into groups 4. Convert each group into a node State transitions: - `:paragraph` accumulates until blank line or different block type - `:blockquote` accumulates until blank line - `:list-item` and `:ordered-item` accumulate until blank line - `:code-start` flips to code-block mode; accumulates until matching fence closer or end of input - `:diff-header` starts a diff block; diff lines accumulate until blank line or non-diff line Edge case: a paragraph followed by a list item should stay as separate blocks (not merge). The blank-line check handles this because the paragraph only continues for non-blank, non-list lines. #+BEGIN_SRC lisp :tangle no (defun parse-blocks (text) "Parse TEXT (a string) into a list of block-level markdown node plists. Returns (nodes . unconsumed-lines) for recursive callers." (let ((lines (split-string-into-lines text)) (nodes nil) (i 0)) (loop while (< i (length lines)) do (let* ((line (string-trim (list #\return) (aref lines i))) (classification (classify-line line))) (case (car classification) (:blank (incf i)) (:thematic-break (push (make-md-node :thematic-break) nodes) (incf i)) (:paragraph (multiple-value-bind (node consumed) (parse-paragraph lines i) (push node nodes) (setf i consumed))) (:heading (let* ((level-and-content (cdr classification)) (level (car level-and-content)) (content (cdr level-and-content))) (push (make-md-node :heading :properties (list :level level) :children (parse-inline content)) nodes) (incf i))) (:blockquote (multiple-value-bind (node consumed) (parse-blockquote lines i) (push node nodes) (setf i consumed))) (:list-item (multiple-value-bind (node consumed) (parse-list lines i :unordered) (push node nodes) (setf i consumed))) (:ordered-item (multiple-value-bind (node consumed) (parse-list lines i :ordered) (push node nodes) (setf i consumed))) (:code-start (multiple-value-bind (node consumed) (parse-code-block lines i (cdr classification)) (push node nodes) (setf i consumed))) (:diff-header (multiple-value-bind (node consumed) (parse-diff-block lines i) (push node nodes) (setf i consumed))) (t (incf i))))) ;; Return in reading order (nreverse nodes))) #+END_SRC --- per-function: split-string-into-lines ~split-string-into-lines~ is a utility rather than relying on ~cl-ppcre~ (which we don't depend on). It splits on #\Newline and handles the edge case of trailing newlines (doesn't produce an extra empty line at the end). #+BEGIN_SRC lisp :tangle no (defun split-string-into-lines (string) "Split STRING into a vector of lines (no trailing newline). Handles \\n, \\r\\n, and trailing newlines properly." (let ((result nil) (start 0)) (flet ((add-line (end) (push (subseq string start end) result))) (loop for i from 0 below (length string) do (let ((c (char string i))) (cond ((char= c #\Newline) (add-line i) (setf start (1+ i))) ((and (char= c #\Return) (< (1+ i) (length string)) (char= (char string (1+ i)) #\Newline)) (add-line i) (setf start (+ i 2)) (incf i))))) (when (< start (length string)) (add-line (length string))) (coerce (nreverse result) 'vector)))) #+END_SRC --- per-function: parse-paragraph ~parse-paragraph~ collects one or more contiguous paragraph lines until a blank line or a different block type. It joins them with spaces (for hard-wrapped prose) and returns a :paragraph node with inline-parsed children. Continuation lines in paragraphs are joined with a single space (not a newline). This is correct for Markdown's soft-wrap convention where a newline in source = space in output. To force a hard break, CommonMark uses two trailing spaces — we skip that for now since it's rare in TUI contexts. #+BEGIN_SRC lisp :tangle no (defun parse-paragraph (lines start) "Parse contiguous paragraph lines from LINES starting at START. Returns (node . consumed-index)." (let ((text-parts nil) (i start)) (loop while (< i (length lines)) do (let* ((raw-line (aref lines i)) (line (string-trim (list #\return) raw-line)) (class (classify-line line))) (case (car class) ((:paragraph) (push (cdr class) text-parts) (incf i)) (:blank (incf i) (loop-finish)) (t (loop-finish))))) (let ((text (with-output-to-string (s) (loop for part in (nreverse text-parts) for first = t then nil do (unless first (write-char #\Space s)) (princ part s))))) (cons (make-md-node :paragraph :children (parse-inline text)) i)))) #+END_SRC --- per-function: parse-blockquote ~parse-blockquote~ collects contiguous `>` lines, strips the `>` prefix, joins them, and wraps in a :blockquote node. Nested blockquotes (`> >`) are not supported in this version — a `>` at the start of the content is treated as literal text. #+BEGIN_SRC lisp :tangle no (defun parse-blockquote (lines start) "Parse contiguous blockquote lines from LINES starting at START. Returns (node . consumed-index)." (let ((text-parts nil) (i start)) (loop while (< i (length lines)) do (let* ((raw-line (aref lines i)) (line (string-trim (list #\return) raw-line)) (class (classify-line line))) (case (car class) (:blockquote (push (cdr class) text-parts) (incf i)) (:blank (incf i) (loop-finish)) (t (loop-finish))))) (let ((text (with-output-to-string (s) (loop for part in (nreverse text-parts) for first = t then nil do (unless first (write-char #\Space s)) (princ part s))))) (cons (make-md-node :blockquote :children (parse-inline text)) i)))) #+END_SRC --- per-function: parse-list ~parse-list~ collects contiguous list items (same type) and returns a list of nodes. Each line starting with a list marker becomes one list-item node. Nested lists are not supported (lines starting with two spaces + marker would be the next level — we skip that for v1). The TYPE parameter is either `:unordered` or `:ordered` — though we return each item labeled by its actual marker type since we already classified each line. #+BEGIN_SRC lisp :tangle no (defun parse-list (lines start type) "Parse contiguous list items from LINES starting at START. TYPE is :unordered or :ordered. Returns (node . consumed-index) where node is a :list-item or :ordered-item." (declare (ignore type)) (let ((items nil) (i start)) ;; Collect all contiguous list items into ITEMS (loop while (< i (length lines)) do (let* ((raw-line (aref lines i)) (line (string-trim (list #\return) raw-line)) (class (classify-line line))) (case (car class) ((:list-item :ordered-item) (push (cons (car class) (cdr class)) items) (incf i)) (:blank ;; One blank line between items is OK; two ends the list (if (and (< (1+ i) (length lines)) (let ((next-class (classify-line (string-trim (list #\return) (aref lines (1+ i)))))) (member (car next-class) '(:list-item :ordered-item)))) (progn (push (cons :blank-sep nil) items) (incf i)) (progn (incf i) (loop-finish)))) (t (loop-finish))))) ;; Convert each item to a node (let ((nodes nil)) (dolist (item (nreverse items)) (let ((type (car item)) (content (cdr item))) (when (and content (not (string= content ""))) (push (make-md-node type :children (parse-inline content)) nodes)))) (cons (nreverse nodes) i)))) #+END_SRC --- per-function: parse-code-block ~parse-code-block~ reads from the line after the opening fence to the closing fence (or end of input). It returns a :code-block node with the language (or NIL) and the raw text as the :content. No inline parsing is done inside code blocks — everything is literal. Matching fence: if opened with `` ``` ``, close with `` ``` ``. If opened with `~~~`, close with `~~~`. The closing fence must have at least as many backticks/tildes as the opening fence (CommonMark rule). We use the simpler version: same character, same count. #+BEGIN_SRC lisp :tangle no (defun parse-code-block (lines start lang) "Parse a fenced code block from LINES starting at START. LANG is the language string (or empty string) from the opening fence. Returns (node . consumed-index)." (let ((code-lines nil) (i (1+ start)) (fence-char (char (aref lines start) 0)) (fence-len (loop for c across (aref lines start) while (char= c (char (aref lines start) 0)) count c)) (found-close nil)) (loop while (< i (length lines)) do (let* ((raw-line (aref lines i)) (line (string-trim (list #\return) raw-line))) ;; Check for closing fence (when (and (>= (length line) fence-len) (every (lambda (c) (char= c fence-char)) (subseq line 0 fence-len)) (or (= (length line) fence-len) (every (lambda (c) (find c " \t")) (subseq line fence-len)))) (setf found-close t) (incf i) (loop-finish))