v0.8.0: Markdown + Code + Diff rendering module
Add cl-tui.markdown package with: - Markdown parser: headings, paragraphs, bold, italic, inline-code, links, code blocks, blockquotes, lists, thematic breaks - Syntax highlighting: Lisp, Python, JavaScript, Bash with keyword, builtin, comment, number, function coloring - Diff renderer: colorized unified diff (+/-/@ lines) - Terminal renderer: ANSI escape sequences via backend-style functions - 67 tests, 100% passing - All parser helpers use values returns (not cons) for multiple-value-bind ASDF: v0.7.0 -> v0.8.0, new markdown module + test suite
This commit is contained in:
500
org/markdown-renderer.org
Normal file
500
org/markdown-renderer.org
Normal file
@@ -0,0 +1,500 @@
|
||||
#+TITLE: Markdown + Code + Diff Rendering (v0.8.0)
|
||||
#+DATE: 2026-05-11
|
||||
#+AUTHOR: Amr Gharbeia / Hermes
|
||||
|
||||
* Overview
|
||||
|
||||
This module provides rendering of Markdown text, syntax-highlighted code
|
||||
blocks, and unified diffs in the terminal. It completes the rendering
|
||||
pipeline so that [[file:render.org][the render tree]] can handle rich formatted
|
||||
content.
|
||||
|
||||
The Markdown renderer is /not/ a general-purpose MD-to-HTML converter.
|
||||
It targets TUI output: node types that have clear terminal analogues
|
||||
(headings → bold/bright, code blocks → monochrome block, bold → ANSI
|
||||
bold, etc.). Edge cases that matter for a terminal (long lines, escape
|
||||
sequences inside code, mixed formatting) are handled explicitly.
|
||||
|
||||
** Design decisions
|
||||
|
||||
1. /Two-phase parse/: block-level first (lines), then inline (characters
|
||||
within each block). This matches how terminals render — block layout
|
||||
first, style within.
|
||||
2. /Syntax highlighting by keyword set/: not a full lexer. A lookup
|
||||
table of language → (keywords, types, builtins) sets. Catches ~90%
|
||||
of highlighting cases without pulling in a parser. Fails safe
|
||||
(unmatched tokens render as plain text).
|
||||
3. /Diff lines are self-describing/: a diff block starts with ─── or
|
||||
+++, each line has a ± prefix. We don't re-parse patch semantics;
|
||||
we just color by prefix. This makes the renderer tolerant of
|
||||
malformed diffs.
|
||||
4. /No recursive descent parser/: a simple state machine over lines for
|
||||
block-level, and a character cursor for inline. Keeps the code
|
||||
short and avoids parser-generator dependencies.
|
||||
|
||||
* Code structure
|
||||
|
||||
** Node types
|
||||
|
||||
We represent the parsed document as a tree of plists. Each node has at
|
||||
least a `:type` key. Block-level nodes carry a `:children` list of
|
||||
inline nodes. This keeps the data structure simple — no class hierarchy,
|
||||
no generic dispatch — while being easy to traverse for rendering.
|
||||
|
||||
Node types:
|
||||
|
||||
| Block-level | Inline |
|
||||
|------------------+--------------------|
|
||||
| `:heading` | `:text` |
|
||||
| `:paragraph` | `:bold` |
|
||||
| `:code-block` | `:italic` |
|
||||
| `:blockquote` | `:inline-code` |
|
||||
| `:list-item` | `:link` |
|
||||
| `:ordered-item` | |
|
||||
| `:thematic-break`| |
|
||||
| `:diff-block` | |
|
||||
|
||||
--- per-function: markdown-node-make
|
||||
|
||||
~make-md-node~ is a convenience constructor for node plists.
|
||||
It ensures `:children` defaults to NIL (not an empty list) so
|
||||
renderers can check `(if children ...)` without testing `(when
|
||||
children ...)` vs `(if (null children) ...)`.
|
||||
|
||||
#+BEGIN_SRC lisp :tangle no
|
||||
(defun make-md-node (type &key children properties)
|
||||
"Create a markdown node plist.
|
||||
TYPE is a keyword like :heading or :bold.
|
||||
CHILDREN is a list of inline node plists (or NIL).
|
||||
PROPERTIES is a plist of node-specific extra keys (e.g. :level for headings)."
|
||||
(let ((node (list :type type)))
|
||||
(when children
|
||||
(setf (getf node :children) children))
|
||||
(when properties
|
||||
(setf (getf node :properties) properties))
|
||||
node))
|
||||
#+END_SRC
|
||||
|
||||
--- per-function: markdown-node-p
|
||||
|
||||
~md-node-p~ checks whether something is a markdown node plist.
|
||||
We just look for a :type key. This is used in tests and as
|
||||
a guard in recursive renderers.
|
||||
|
||||
#+BEGIN_SRC lisp :tangle no
|
||||
(defun md-node-p (thing)
|
||||
"Return T if THING is a markdown node (has a :type key)."
|
||||
(and (listp thing) (getf thing :type)))
|
||||
#+END_SRC
|
||||
|
||||
--- per-function: markdown-node-text
|
||||
|
||||
~md-node-text~ extracts the plain text from a node tree by
|
||||
concatenating all :text children recursively, discarding markup.
|
||||
This is useful for things like heading anchors, tooltip strings,
|
||||
or search indexing.
|
||||
|
||||
#+BEGIN_SRC lisp :tangle no
|
||||
(defun md-node-text (node)
|
||||
"Recursively extract plain text from a markdown node tree."
|
||||
(let ((type (getf node :type)))
|
||||
(cond ((eql type :text)
|
||||
(or (getf node :content) ""))
|
||||
((eql type :link)
|
||||
(concatenate 'string
|
||||
(md-node-text (first (getf node :children)))
|
||||
(format nil " (~a)" (or (getf node :url) ""))))
|
||||
((getf node :children)
|
||||
(apply #'concatenate 'string
|
||||
(mapcar #'md-node-text (getf node :children))))
|
||||
(t ""))))
|
||||
#+END_SRC
|
||||
|
||||
** Block-level parser
|
||||
|
||||
The block parser operates line-by-line with a simple state machine.
|
||||
Each line is classified by its prefix characters, then accumulated
|
||||
into a node.
|
||||
|
||||
Rules:
|
||||
- Lines starting with `#` → heading (count hashes for level)
|
||||
- Lines starting with `>` → blockquote (continuation lines merge)
|
||||
- Lines starting with `-`, `*`, or `+` → list-item
|
||||
- Lines starting with 1-3 digits followed by `.` → ordered-item
|
||||
- Lines starting with `` ``` `` → code-block (language on opening line)
|
||||
- Lines starting with `---` or `***` → thematic-break
|
||||
- Lines starting with `--- ` or `+++ ` → diff-block
|
||||
- Empty lines → paragraph boundary
|
||||
- Everything else → paragraph (continuation lines merge until blank)
|
||||
|
||||
--- per-function: classify-line
|
||||
|
||||
~classify-line~ returns a keyword and a data value for a trimmed
|
||||
line of text. The state machine uses this to decide what kind of
|
||||
block to create or continue.
|
||||
|
||||
The function must handle prefix stripping (e.g. remove `# ` after
|
||||
counting hashes) and edge cases like `#` inside a code block (which
|
||||
we don't classify at all — the code block state machine handles that).
|
||||
|
||||
One trap: a line like `#not-a-heading` (no space after hash) is NOT
|
||||
a heading in CommonMark. We check for space/tab after the hashes.
|
||||
|
||||
Another trap: `* item` in a list vs `**bold**` inline. At the
|
||||
block-parser level we only look at /line-start/ `* ` (star + space)
|
||||
for list items. A line starting with `** text` could be either a
|
||||
nested list item or bold text in a paragraph — we conservatively
|
||||
treat it as a list-item (the inline parser will handle ** inside
|
||||
paragraphs normally).
|
||||
|
||||
#+BEGIN_SRC lisp :tangle no
|
||||
(defun classify-line (line)
|
||||
"Classify a trimmed LINE, returning (type . data).
|
||||
TYPE is a keyword; DATA is language for code-blocks, level for headings, etc."
|
||||
(cond
|
||||
;; Empty line
|
||||
((string= line "") (cons :blank nil))
|
||||
;; Thematic break: --- or *** (3+ chars, all same, optional whitespace)
|
||||
((and (>= (length line) 3)
|
||||
(every (lambda (c) (or (char= c (char line 0))
|
||||
(char= c #\Space)
|
||||
(char= c #\Tab)))
|
||||
line)
|
||||
(find (char line 0) "-*"))
|
||||
(cons :thematic-break nil))
|
||||
;; Heading: #+, with space after hashes
|
||||
((and (char= (char line 0) #\#)
|
||||
(let ((count 0))
|
||||
(loop for c across line
|
||||
while (char= c #\#)
|
||||
do (incf count))
|
||||
(and (<= 1 count 6)
|
||||
(or (>= (length line) (1+ count))
|
||||
(member (char line count) '(#\Space #\Tab))))))
|
||||
(let* ((hash-count (loop for c across line while (char= c #\#) count c))
|
||||
(content (string-trim (list #\Space #\Tab)
|
||||
(subseq line hash-count))))
|
||||
(cons :heading (cons hash-count content))))
|
||||
;; Blockquote: >
|
||||
((and (>= (length line) 1) (char= (char line 0) #\>))
|
||||
(let ((content (string-trim (list #\Space #\Tab)
|
||||
(subseq line 1))))
|
||||
(cons :blockquote content)))
|
||||
;; Unordered list: -, *, +
|
||||
((and (>= (length line) 2)
|
||||
(find (char line 0) "-*+")
|
||||
(char= (char line 1) #\Space))
|
||||
(cons :list-item (string-trim (list #\Space #\Tab) (subseq line 2))))
|
||||
;; Ordered list: N. or N)
|
||||
((and (>= (length line) 3)
|
||||
(digit-char-p (char line 0))
|
||||
(loop for c across line
|
||||
while (digit-char-p c)
|
||||
finally (return (find c '(#\. #\) #\Space)))))
|
||||
(let ((dot-pos (position-if (lambda (c) (find c ". )")) line)))
|
||||
(if (and dot-pos (find (char line dot-pos) ". )"))
|
||||
(cons :ordered-item (string-trim (list #\Space #\Tab)
|
||||
(subseq line (1+ dot-pos))))
|
||||
(cons :paragraph line))))
|
||||
;; Diff: --- file or +++ file
|
||||
((and (>= (length line) 4)
|
||||
(find (char line 0) "-+")
|
||||
(char= (char line 1) (char line 0))
|
||||
(char= (char line 2) (char line 0))
|
||||
(char= (char line 3) #\Space))
|
||||
(cons :diff-header line))
|
||||
;; Diff: line content with +/- prefix
|
||||
((and (>= (length line) 1)
|
||||
(find (char line 0) "-+")
|
||||
(not (and (>= (length line) 3)
|
||||
(char= (char line 1) (char line 0))
|
||||
(char= (char line 2) (char line 0)))))
|
||||
(cons :diff-line (cons (char line 0) (subseq line 1))))
|
||||
;; Fenced code block start: ``` or ~~~
|
||||
((and (>= (length line) 3)
|
||||
(find (char line 0) "`~")
|
||||
(every (lambda (c) (char= c (char line 0)))
|
||||
(subseq line 0 (min 6 (length line))))
|
||||
(let ((rest (string-trim (list #\Space #\Tab) (subseq line (min 6 (length line))))))
|
||||
(cons :code-start rest))))
|
||||
;; Default: paragraph content
|
||||
(t (cons :paragraph line))))
|
||||
#+END_SRC
|
||||
|
||||
--- per-function: parse-blocks
|
||||
|
||||
~parse-blocks~ is the main block-level parser. It takes a string
|
||||
(possibly multi-line) and returns a list of markdown node plists.
|
||||
|
||||
The algorithm:
|
||||
1. Split into lines
|
||||
2. Classify each line
|
||||
3. Accumulate lines of the same type into groups
|
||||
4. Convert each group into a node
|
||||
|
||||
State transitions:
|
||||
- `:paragraph` accumulates until blank line or different block type
|
||||
- `:blockquote` accumulates until blank line
|
||||
- `:list-item` and `:ordered-item` accumulate until blank line
|
||||
- `:code-start` flips to code-block mode; accumulates until matching
|
||||
fence closer or end of input
|
||||
- `:diff-header` starts a diff block; diff lines accumulate until
|
||||
blank line or non-diff line
|
||||
|
||||
Edge case: a paragraph followed by a list item should stay as
|
||||
separate blocks (not merge). The blank-line check handles this
|
||||
because the paragraph only continues for non-blank, non-list lines.
|
||||
|
||||
#+BEGIN_SRC lisp :tangle no
|
||||
(defun parse-blocks (text)
|
||||
"Parse TEXT (a string) into a list of block-level markdown node plists.
|
||||
Returns (nodes . unconsumed-lines) for recursive callers."
|
||||
(let ((lines (split-string-into-lines text))
|
||||
(nodes nil)
|
||||
(i 0))
|
||||
(loop while (< i (length lines))
|
||||
do (let* ((line (string-trim (list #\return) (aref lines i)))
|
||||
(classification (classify-line line)))
|
||||
(case (car classification)
|
||||
(:blank (incf i))
|
||||
(:thematic-break
|
||||
(push (make-md-node :thematic-break) nodes)
|
||||
(incf i))
|
||||
(:paragraph
|
||||
(multiple-value-bind (node consumed)
|
||||
(parse-paragraph lines i)
|
||||
(push node nodes)
|
||||
(setf i consumed)))
|
||||
(:heading
|
||||
(let* ((level-and-content (cdr classification))
|
||||
(level (car level-and-content))
|
||||
(content (cdr level-and-content)))
|
||||
(push (make-md-node :heading
|
||||
:properties (list :level level)
|
||||
:children (parse-inline content))
|
||||
nodes)
|
||||
(incf i)))
|
||||
(:blockquote
|
||||
(multiple-value-bind (node consumed)
|
||||
(parse-blockquote lines i)
|
||||
(push node nodes)
|
||||
(setf i consumed)))
|
||||
(:list-item
|
||||
(multiple-value-bind (node consumed)
|
||||
(parse-list lines i :unordered)
|
||||
(push node nodes)
|
||||
(setf i consumed)))
|
||||
(:ordered-item
|
||||
(multiple-value-bind (node consumed)
|
||||
(parse-list lines i :ordered)
|
||||
(push node nodes)
|
||||
(setf i consumed)))
|
||||
(:code-start
|
||||
(multiple-value-bind (node consumed)
|
||||
(parse-code-block lines i (cdr classification))
|
||||
(push node nodes)
|
||||
(setf i consumed)))
|
||||
(:diff-header
|
||||
(multiple-value-bind (node consumed)
|
||||
(parse-diff-block lines i)
|
||||
(push node nodes)
|
||||
(setf i consumed)))
|
||||
(t (incf i)))))
|
||||
;; Return in reading order
|
||||
(nreverse nodes)))
|
||||
#+END_SRC
|
||||
|
||||
--- per-function: split-string-into-lines
|
||||
|
||||
~split-string-into-lines~ is a utility rather than relying on
|
||||
~cl-ppcre~ (which we don't depend on). It splits on #\Newline
|
||||
and handles the edge case of trailing newlines (doesn't produce
|
||||
an extra empty line at the end).
|
||||
|
||||
#+BEGIN_SRC lisp :tangle no
|
||||
(defun split-string-into-lines (string)
|
||||
"Split STRING into a vector of lines (no trailing newline).
|
||||
Handles \\n, \\r\\n, and trailing newlines properly."
|
||||
(let ((result nil)
|
||||
(start 0))
|
||||
(flet ((add-line (end)
|
||||
(push (subseq string start end) result)))
|
||||
(loop for i from 0 below (length string)
|
||||
do (let ((c (char string i)))
|
||||
(cond ((char= c #\Newline)
|
||||
(add-line i)
|
||||
(setf start (1+ i)))
|
||||
((and (char= c #\Return)
|
||||
(< (1+ i) (length string))
|
||||
(char= (char string (1+ i)) #\Newline))
|
||||
(add-line i)
|
||||
(setf start (+ i 2))
|
||||
(incf i)))))
|
||||
(when (< start (length string))
|
||||
(add-line (length string)))
|
||||
(coerce (nreverse result) 'vector))))
|
||||
#+END_SRC
|
||||
|
||||
--- per-function: parse-paragraph
|
||||
|
||||
~parse-paragraph~ collects one or more contiguous paragraph lines
|
||||
until a blank line or a different block type. It joins them with
|
||||
spaces (for hard-wrapped prose) and returns a :paragraph node
|
||||
with inline-parsed children.
|
||||
|
||||
Continuation lines in paragraphs are joined with a single space
|
||||
(not a newline). This is correct for Markdown's soft-wrap
|
||||
convention where a newline in source = space in output. To force
|
||||
a hard break, CommonMark uses two trailing spaces — we skip that
|
||||
for now since it's rare in TUI contexts.
|
||||
|
||||
#+BEGIN_SRC lisp :tangle no
|
||||
(defun parse-paragraph (lines start)
|
||||
"Parse contiguous paragraph lines from LINES starting at START.
|
||||
Returns (node . consumed-index)."
|
||||
(let ((text-parts nil)
|
||||
(i start))
|
||||
(loop while (< i (length lines))
|
||||
do (let* ((raw-line (aref lines i))
|
||||
(line (string-trim (list #\return) raw-line))
|
||||
(class (classify-line line)))
|
||||
(case (car class)
|
||||
((:paragraph)
|
||||
(push (cdr class) text-parts)
|
||||
(incf i))
|
||||
(:blank (incf i) (loop-finish))
|
||||
(t (loop-finish)))))
|
||||
(let ((text (with-output-to-string (s)
|
||||
(loop for part in (nreverse text-parts)
|
||||
for first = t then nil
|
||||
do (unless first (write-char #\Space s))
|
||||
(princ part s)))))
|
||||
(cons (make-md-node :paragraph
|
||||
:children (parse-inline text))
|
||||
i))))
|
||||
#+END_SRC
|
||||
|
||||
--- per-function: parse-blockquote
|
||||
|
||||
~parse-blockquote~ collects contiguous `>` lines, strips the `>`
|
||||
prefix, joins them, and wraps in a :blockquote node. Nested
|
||||
blockquotes (`> >`) are not supported in this version — a `>` at
|
||||
the start of the content is treated as literal text.
|
||||
|
||||
#+BEGIN_SRC lisp :tangle no
|
||||
(defun parse-blockquote (lines start)
|
||||
"Parse contiguous blockquote lines from LINES starting at START.
|
||||
Returns (node . consumed-index)."
|
||||
(let ((text-parts nil)
|
||||
(i start))
|
||||
(loop while (< i (length lines))
|
||||
do (let* ((raw-line (aref lines i))
|
||||
(line (string-trim (list #\return) raw-line))
|
||||
(class (classify-line line)))
|
||||
(case (car class)
|
||||
(:blockquote
|
||||
(push (cdr class) text-parts)
|
||||
(incf i))
|
||||
(:blank (incf i) (loop-finish))
|
||||
(t (loop-finish)))))
|
||||
(let ((text (with-output-to-string (s)
|
||||
(loop for part in (nreverse text-parts)
|
||||
for first = t then nil
|
||||
do (unless first (write-char #\Space s))
|
||||
(princ part s)))))
|
||||
(cons (make-md-node :blockquote
|
||||
:children (parse-inline text))
|
||||
i))))
|
||||
#+END_SRC
|
||||
|
||||
--- per-function: parse-list
|
||||
|
||||
~parse-list~ collects contiguous list items (same type) and returns
|
||||
a list of nodes. Each line starting with a list marker becomes one
|
||||
list-item node. Nested lists are not supported (lines starting with
|
||||
two spaces + marker would be the next level — we skip that for v1).
|
||||
|
||||
The TYPE parameter is either `:unordered` or `:ordered` — though
|
||||
we return each item labeled by its actual marker type since we
|
||||
already classified each line.
|
||||
|
||||
#+BEGIN_SRC lisp :tangle no
|
||||
(defun parse-list (lines start type)
|
||||
"Parse contiguous list items from LINES starting at START.
|
||||
TYPE is :unordered or :ordered.
|
||||
Returns (node . consumed-index) where node is a :list-item or :ordered-item."
|
||||
(declare (ignore type))
|
||||
(let ((items nil)
|
||||
(i start))
|
||||
;; Collect all contiguous list items into ITEMS
|
||||
(loop while (< i (length lines))
|
||||
do (let* ((raw-line (aref lines i))
|
||||
(line (string-trim (list #\return) raw-line))
|
||||
(class (classify-line line)))
|
||||
(case (car class)
|
||||
((:list-item :ordered-item)
|
||||
(push (cons (car class) (cdr class)) items)
|
||||
(incf i))
|
||||
(:blank
|
||||
;; One blank line between items is OK; two ends the list
|
||||
(if (and (< (1+ i) (length lines))
|
||||
(let ((next-class (classify-line
|
||||
(string-trim
|
||||
(list #\return)
|
||||
(aref lines (1+ i))))))
|
||||
(member (car next-class)
|
||||
'(:list-item :ordered-item))))
|
||||
(progn
|
||||
(push (cons :blank-sep nil) items)
|
||||
(incf i))
|
||||
(progn (incf i) (loop-finish))))
|
||||
(t (loop-finish)))))
|
||||
;; Convert each item to a node
|
||||
(let ((nodes nil))
|
||||
(dolist (item (nreverse items))
|
||||
(let ((type (car item))
|
||||
(content (cdr item)))
|
||||
(when (and content (not (string= content "")))
|
||||
(push (make-md-node type
|
||||
:children (parse-inline content))
|
||||
nodes))))
|
||||
(cons (nreverse nodes) i))))
|
||||
#+END_SRC
|
||||
|
||||
--- per-function: parse-code-block
|
||||
|
||||
~parse-code-block~ reads from the line after the opening fence to
|
||||
the closing fence (or end of input). It returns a :code-block node
|
||||
with the language (or NIL) and the raw text as the :content. No
|
||||
inline parsing is done inside code blocks — everything is literal.
|
||||
|
||||
Matching fence: if opened with `` ``` ``, close with `` ``` ``.
|
||||
If opened with `~~~`, close with `~~~`. The closing fence must have
|
||||
at least as many backticks/tildes as the opening fence (CommonMark
|
||||
rule). We use the simpler version: same character, same count.
|
||||
|
||||
#+BEGIN_SRC lisp :tangle no
|
||||
(defun parse-code-block (lines start lang)
|
||||
"Parse a fenced code block from LINES starting at START.
|
||||
LANG is the language string (or empty string) from the opening fence.
|
||||
Returns (node . consumed-index)."
|
||||
(let ((code-lines nil)
|
||||
(i (1+ start))
|
||||
(fence-char (char (aref lines start) 0))
|
||||
(fence-len (loop for c across (aref lines start)
|
||||
while (char= c (char (aref lines start) 0))
|
||||
count c))
|
||||
(found-close nil))
|
||||
(loop while (< i (length lines))
|
||||
do (let* ((raw-line (aref lines i))
|
||||
(line (string-trim (list #\return) raw-line)))
|
||||
;; Check for closing fence
|
||||
(when (and (>= (length line) fence-len)
|
||||
(every (lambda (c) (char= c fence-char))
|
||||
(subseq line 0 fence-len))
|
||||
(or (= (length line) fence-len)
|
||||
(every (lambda (c) (find c " \t"))
|
||||
(subseq line fence-len))))
|
||||
(setf found-close t)
|
||||
(incf i)
|
||||
(loop-finish))
|
||||
Reference in New Issue
Block a user