amr/cl-tty

Files

Hermes 9648c72b85 v0.8.0: Markdown + Code + Diff rendering module

Add cl-tui.markdown package with:
- Markdown parser: headings, paragraphs, bold, italic, inline-code, links,
  code blocks, blockquotes, lists, thematic breaks
- Syntax highlighting: Lisp, Python, JavaScript, Bash with keyword,
  builtin, comment, number, function coloring
- Diff renderer: colorized unified diff (+/-/@ lines)
- Terminal renderer: ANSI escape sequences via backend-style functions
- 67 tests, 100% passing
- All parser helpers use values returns (not cons) for multiple-value-bind

ASDF: v0.7.0 -> v0.8.0, new markdown module + test suite

2026-05-11 18:26:34 +00:00

20 KiB

Raw Blame History

Markdown + Code + Diff Rendering (v0.8.0)

Overview
- Design decisions
Code structure
- Node types
- Block-level parser

Overview

This module provides rendering of Markdown text, syntax-highlighted code blocks, and unified diffs in the terminal. It completes the rendering pipeline so that the render tree can handle rich formatted content.

The Markdown renderer is not a general-purpose MD-to-HTML converter. It targets TUI output: node types that have clear terminal analogues (headings → bold/bright, code blocks → monochrome block, bold → ANSI bold, etc.). Edge cases that matter for a terminal (long lines, escape sequences inside code, mixed formatting) are handled explicitly.

Design decisions

Two-phase parse: block-level first (lines), then inline (characters within each block). This matches how terminals render — block layout first, style within.
Syntax highlighting by keyword set: not a full lexer. A lookup table of language → (keywords, types, builtins) sets. Catches ~90% of highlighting cases without pulling in a parser. Fails safe (unmatched tokens render as plain text).
Diff lines are self-describing: a diff block starts with ─── or +, each line has a ± prefix. We don't re-parse patch semantics; we just color by prefix. This makes the renderer tolerant of malformed diffs.
No recursive descent parser: a simple state machine over lines for block-level, and a character cursor for inline. Keeps the code short and avoids parser-generator dependencies.

Code structure

Node types

We represent the parsed document as a tree of plists. Each node has at least a `:type` key. Block-level nodes carry a `:children` list of inline nodes. This keeps the data structure simple — no class hierarchy, no generic dispatch — while being easy to traverse for rendering.

Node types:

Block-level	Inline
`:heading`	`:text`
`:paragraph`	`:bold`
`:code-block`	`:italic`
`:blockquote`	`:inline-code`
`:list-item`	`:link`
`:ordered-item`
`:thematic-break`
`:diff-block`

— per-function: markdown-node-make

make-md-node is a convenience constructor for node plists. It ensures `:children` defaults to NIL (not an empty list) so renderers can check `(if children …)` without testing `(when children …)` vs `(if (null children) …)`.

(defun make-md-node (type &key children properties)
  "Create a markdown node plist.
TYPE is a keyword like :heading or :bold.
CHILDREN is a list of inline node plists (or NIL).
PROPERTIES is a plist of node-specific extra keys (e.g. :level for headings)."
  (let ((node (list :type type)))
    (when children
      (setf (getf node :children) children))
    (when properties
      (setf (getf node :properties) properties))
    node))

— per-function: markdown-node-p

md-node-p checks whether something is a markdown node plist. We just look for a :type key. This is used in tests and as a guard in recursive renderers.

(defun md-node-p (thing)
  "Return T if THING is a markdown node (has a :type key)."
  (and (listp thing) (getf thing :type)))

— per-function: markdown-node-text

md-node-text extracts the plain text from a node tree by concatenating all :text children recursively, discarding markup. This is useful for things like heading anchors, tooltip strings, or search indexing.

(defun md-node-text (node)
  "Recursively extract plain text from a markdown node tree."
  (let ((type (getf node :type)))
    (cond ((eql type :text)
           (or (getf node :content) ""))
          ((eql type :link)
           (concatenate 'string
                        (md-node-text (first (getf node :children)))
                        (format nil " (~a)" (or (getf node :url) ""))))
          ((getf node :children)
           (apply #'concatenate 'string
                  (mapcar #'md-node-text (getf node :children))))
          (t ""))))

Block-level parser

The block parser operates line-by-line with a simple state machine. Each line is classified by its prefix characters, then accumulated into a node.

Rules:

Lines starting with `#` → heading (count hashes for level)
Lines starting with `>` → blockquote (continuation lines merge)
Lines starting with `-`, `*`, or `+` → list-item
Lines starting with 1-3 digits followed by `.` → ordered-item
Lines starting with `` ``` `` → code-block (language on opening line)
Lines starting with `—` or `***` → thematic-break
Lines starting with `— ` or `+++ ` → diff-block
Empty lines → paragraph boundary
Everything else → paragraph (continuation lines merge until blank)

— per-function: classify-line

classify-line returns a keyword and a data value for a trimmed line of text. The state machine uses this to decide what kind of block to create or continue.

The function must handle prefix stripping (e.g. remove `# ` after counting hashes) and edge cases like `#` inside a code block (which we don't classify at all — the code block state machine handles that).

One trap: a line like `#not-a-heading` (no space after hash) is NOT a heading in CommonMark. We check for space/tab after the hashes.

Another trap: `* item` in a list vs `**bold**` inline. At the block-parser level we only look at line-start `* ` (star + space) for list items. A line starting with `** text` could be either a nested list item or bold text in a paragraph — we conservatively treat it as a list-item (the inline parser will handle ** inside paragraphs normally).

(defun classify-line (line)
  "Classify a trimmed LINE, returning (type . data).
TYPE is a keyword; DATA is language for code-blocks, level for headings, etc."
  (cond
    ;; Empty line
    ((string= line "") (cons :blank nil))
    ;; Thematic break: --- or *** (3+ chars, all same, optional whitespace)
    ((and (>= (length line) 3)
          (every (lambda (c) (or (char= c (char line 0))
                                 (char= c #\Space)
                                 (char= c #\Tab)))
                 line)
          (find (char line 0) "-*"))
     (cons :thematic-break nil))
    ;; Heading: #+, with space after hashes
    ((and (char= (char line 0) #\#)
          (let ((count 0))
            (loop for c across line
                  while (char= c #\#)
                  do (incf count))
            (and (<= 1 count 6)
                 (or (>= (length line) (1+ count))
                     (member (char line count) '(#\Space #\Tab))))))
     (let* ((hash-count (loop for c across line while (char= c #\#) count c))
            (content (string-trim (list #\Space #\Tab)
                                  (subseq line hash-count))))
       (cons :heading (cons hash-count content))))
    ;; Blockquote: >
    ((and (>= (length line) 1) (char= (char line 0) #\>))
     (let ((content (string-trim (list #\Space #\Tab)
                                 (subseq line 1))))
       (cons :blockquote content)))
    ;; Unordered list: -, *, +
    ((and (>= (length line) 2)
          (find (char line 0) "-*+")
          (char= (char line 1) #\Space))
     (cons :list-item (string-trim (list #\Space #\Tab) (subseq line 2))))
    ;; Ordered list: N. or N)
    ((and (>= (length line) 3)
          (digit-char-p (char line 0))
          (loop for c across line
                while (digit-char-p c)
                finally (return (find c '(#\. #\) #\Space)))))
     (let ((dot-pos (position-if (lambda (c) (find c ". )")) line)))
       (if (and dot-pos (find (char line dot-pos) ". )"))
           (cons :ordered-item (string-trim (list #\Space #\Tab)
                                            (subseq line (1+ dot-pos))))
           (cons :paragraph line))))
    ;; Diff: --- file or +++ file
    ((and (>= (length line) 4)
          (find (char line 0) "-+")
          (char= (char line 1) (char line 0))
          (char= (char line 2) (char line 0))
          (char= (char line 3) #\Space))
     (cons :diff-header line))
    ;; Diff: line content with +/- prefix
    ((and (>= (length line) 1)
          (find (char line 0) "-+")
          (not (and (>= (length line) 3)
                    (char= (char line 1) (char line 0))
                    (char= (char line 2) (char line 0)))))
     (cons :diff-line (cons (char line 0) (subseq line 1))))
    ;; Fenced code block start: ``` or ~~~
    ((and (>= (length line) 3)
          (find (char line 0) "`~")
          (every (lambda (c) (char= c (char line 0)))
                 (subseq line 0 (min 6 (length line))))
          (let ((rest (string-trim (list #\Space #\Tab) (subseq line (min 6 (length line))))))
            (cons :code-start rest))))
    ;; Default: paragraph content
    (t (cons :paragraph line))))

— per-function: parse-blocks

parse-blocks is the main block-level parser. It takes a string (possibly multi-line) and returns a list of markdown node plists.

The algorithm:

Split into lines
Classify each line
Accumulate lines of the same type into groups
Convert each group into a node

State transitions:

`:paragraph` accumulates until blank line or different block type
`:blockquote` accumulates until blank line
`:list-item` and `:ordered-item` accumulate until blank line
`:code-start` flips to code-block mode; accumulates until matching fence closer or end of input
`:diff-header` starts a diff block; diff lines accumulate until blank line or non-diff line

Edge case: a paragraph followed by a list item should stay as separate blocks (not merge). The blank-line check handles this because the paragraph only continues for non-blank, non-list lines.

(defun parse-blocks (text)
  "Parse TEXT (a string) into a list of block-level markdown node plists.
Returns (nodes . unconsumed-lines) for recursive callers."
  (let ((lines (split-string-into-lines text))
        (nodes nil)
        (i 0))
    (loop while (< i (length lines))
          do (let* ((line (string-trim (list #\return) (aref lines i)))
                    (classification (classify-line line)))
               (case (car classification)
                 (:blank (incf i))
                 (:thematic-break
                  (push (make-md-node :thematic-break) nodes)
                  (incf i))
                 (:paragraph
                  (multiple-value-bind (node consumed)
                      (parse-paragraph lines i)
                    (push node nodes)
                    (setf i consumed)))
                 (:heading
                  (let* ((level-and-content (cdr classification))
                         (level (car level-and-content))
                         (content (cdr level-and-content)))
                    (push (make-md-node :heading
                                        :properties (list :level level)
                                        :children (parse-inline content))
                          nodes)
                    (incf i)))
                 (:blockquote
                  (multiple-value-bind (node consumed)
                      (parse-blockquote lines i)
                    (push node nodes)
                    (setf i consumed)))
                 (:list-item
                  (multiple-value-bind (node consumed)
                      (parse-list lines i :unordered)
                    (push node nodes)
                    (setf i consumed)))
                 (:ordered-item
                  (multiple-value-bind (node consumed)
                      (parse-list lines i :ordered)
                    (push node nodes)
                    (setf i consumed)))
                 (:code-start
                  (multiple-value-bind (node consumed)
                      (parse-code-block lines i (cdr classification))
                    (push node nodes)
                    (setf i consumed)))
                 (:diff-header
                  (multiple-value-bind (node consumed)
                      (parse-diff-block lines i)
                    (push node nodes)
                    (setf i consumed)))
                 (t (incf i)))))
    ;; Return in reading order
    (nreverse nodes)))

— per-function: split-string-into-lines

split-string-into-lines is a utility rather than relying on cl-ppcre (which we don't depend on). It splits on #\Newline and handles the edge case of trailing newlines (doesn't produce an extra empty line at the end).

(defun split-string-into-lines (string)
  "Split STRING into a vector of lines (no trailing newline).
Handles \\n, \\r\\n, and trailing newlines properly."
  (let ((result nil)
        (start 0))
    (flet ((add-line (end)
             (push (subseq string start end) result)))
      (loop for i from 0 below (length string)
            do (let ((c (char string i)))
                 (cond ((char= c #\Newline)
                        (add-line i)
                        (setf start (1+ i)))
                       ((and (char= c #\Return)
                             (< (1+ i) (length string))
                             (char= (char string (1+ i)) #\Newline))
                        (add-line i)
                        (setf start (+ i 2))
                        (incf i)))))
      (when (< start (length string))
        (add-line (length string)))
      (coerce (nreverse result) 'vector))))

— per-function: parse-paragraph

parse-paragraph collects one or more contiguous paragraph lines until a blank line or a different block type. It joins them with spaces (for hard-wrapped prose) and returns a :paragraph node with inline-parsed children.

Continuation lines in paragraphs are joined with a single space (not a newline). This is correct for Markdown's soft-wrap convention where a newline in source = space in output. To force a hard break, CommonMark uses two trailing spaces — we skip that for now since it's rare in TUI contexts.

(defun parse-paragraph (lines start)
  "Parse contiguous paragraph lines from LINES starting at START.
Returns (node . consumed-index)."
  (let ((text-parts nil)
        (i start))
    (loop while (< i (length lines))
          do (let* ((raw-line (aref lines i))
                    (line (string-trim (list #\return) raw-line))
                    (class (classify-line line)))
               (case (car class)
                 ((:paragraph)
                  (push (cdr class) text-parts)
                  (incf i))
                 (:blank (incf i) (loop-finish))
                 (t (loop-finish)))))
    (let ((text (with-output-to-string (s)
                  (loop for part in (nreverse text-parts)
                        for first = t then nil
                        do (unless first (write-char #\Space s))
                        (princ part s)))))
      (cons (make-md-node :paragraph
                          :children (parse-inline text))
            i))))

— per-function: parse-blockquote

parse-blockquote collects contiguous `>` lines, strips the `>` prefix, joins them, and wraps in a :blockquote node. Nested blockquotes (`> >`) are not supported in this version — a `>` at the start of the content is treated as literal text.

(defun parse-blockquote (lines start)
  "Parse contiguous blockquote lines from LINES starting at START.
Returns (node . consumed-index)."
  (let ((text-parts nil)
        (i start))
    (loop while (< i (length lines))
          do (let* ((raw-line (aref lines i))
                    (line (string-trim (list #\return) raw-line))
                    (class (classify-line line)))
               (case (car class)
                 (:blockquote
                  (push (cdr class) text-parts)
                  (incf i))
                 (:blank (incf i) (loop-finish))
                 (t (loop-finish)))))
    (let ((text (with-output-to-string (s)
                  (loop for part in (nreverse text-parts)
                        for first = t then nil
                        do (unless first (write-char #\Space s))
                        (princ part s)))))
      (cons (make-md-node :blockquote
                          :children (parse-inline text))
            i))))

— per-function: parse-list

parse-list collects contiguous list items (same type) and returns a list of nodes. Each line starting with a list marker becomes one list-item node. Nested lists are not supported (lines starting with two spaces + marker would be the next level — we skip that for v1).

The TYPE parameter is either `:unordered` or `:ordered` — though we return each item labeled by its actual marker type since we already classified each line.

(defun parse-list (lines start type)
  "Parse contiguous list items from LINES starting at START.
TYPE is :unordered or :ordered.
Returns (node . consumed-index) where node is a :list-item or :ordered-item."
  (declare (ignore type))
  (let ((items nil)
        (i start))
    ;; Collect all contiguous list items into ITEMS
    (loop while (< i (length lines))
          do (let* ((raw-line (aref lines i))
                    (line (string-trim (list #\return) raw-line))
                    (class (classify-line line)))
               (case (car class)
                 ((:list-item :ordered-item)
                  (push (cons (car class) (cdr class)) items)
                  (incf i))
                 (:blank
                  ;; One blank line between items is OK; two ends the list
                  (if (and (< (1+ i) (length lines))
                           (let ((next-class (classify-line
                                              (string-trim
                                               (list #\return)
                                               (aref lines (1+ i))))))
                             (member (car next-class)
                                     '(:list-item :ordered-item))))
                      (progn
                        (push (cons :blank-sep nil) items)
                        (incf i))
                      (progn (incf i) (loop-finish))))
                 (t (loop-finish)))))
    ;; Convert each item to a node
    (let ((nodes nil))
      (dolist (item (nreverse items))
        (let ((type (car item))
              (content (cdr item)))
          (when (and content (not (string= content "")))
            (push (make-md-node type
                                :children (parse-inline content))
                  nodes))))
      (cons (nreverse nodes) i))))

— per-function: parse-code-block

parse-code-block reads from the line after the opening fence to the closing fence (or end of input). It returns a :code-block node with the language (or NIL) and the raw text as the :content. No inline parsing is done inside code blocks — everything is literal.

Matching fence: if opened with `` ``` ``, close with `` ``` ``. If opened with `~~~`, close with `~~~`. The closing fence must have at least as many backticks/tildes as the opening fence (CommonMark rule). We use the simpler version: same character, same count.

#+BEGIN_SRC lisp :tangle no (defun parse-code-block (lines start lang) "Parse a fenced code block from LINES starting at START. LANG is the language string (or empty string) from the opening fence. Returns (node . consumed-index)." (let ((code-lines nil) (i (1+ start)) (fence-char (char (aref lines start) 0)) (fence-len (loop for c across (aref lines start) while (char= c (char (aref lines start) 0)) count c)) (found-close nil)) (loop while (< i (length lines)) do (let* ((raw-line (aref lines i)) (line (string-trim (list #\return) raw-line))) ;; Check for closing fence (when (and (>= (length line) fence-len) (every (lambda (c) (char= c fence-char)) (subseq line 0 fence-len)) (or (= (length line) fence-len) (every (lambda (c) (find c " \t")) (subseq line fence-len)))) (setf found-close t) (incf i) (loop-finish))

20 KiB Raw Blame History