64 KiB
Markdown, Syntax Highlighting, and Diff Rendering
- Overview
- Implementation
- Tests
Overview
Markdown parser with inline formatting, code block syntax highlighting,
and diff rendering. Self-contained in cl-tty.markdown package.
Implementation
Package
(defpackage :cl-tty.markdown
(:use :cl)
(:export
#:make-md-node #:md-node-p #:md-node-text
#:parse-blocks #:parse-inline
#:highlight-code
#:classify-diff-line #:render-md #:render-md-node
#:render-markdown #:render-inline
#:apply-style #:apply-styles))
Main module
The main module file header includes the package declaration and a
comment indicating the file's purpose. This block is the first to
target markdown.lisp and thus overwrites any previous content;
all subsequent blocks append.
;;; markdown.lisp — Markdown + Code + Diff rendering for cl-tty
(in-package :cl-tty.markdown)
Node constructors
Node constructors provide a uniform way to build the AST for parsed
Markdown. Using plists (property lists) with a :type key gives us
flexibility — we can attach arbitrary metadata without a rigid class
hierarchy, which keeps the parser simple and the data easy to
introspect from the REPL.
make-md-node
make-md-node is the primary constructor. It accepts a required type
symbol and optional keyword arguments for children, properties,
content, and url. Only non-nil slots are stored, keeping the
plist compact.
(defun make-md-node (type &key children properties content url)
(let ((node (list :type type)))
(when children (setf (getf node :children) children))
(when properties (setf (getf node :properties) properties))
(when content (setf (getf node :content) content))
(when url (setf (getf node :url) url))
node))
md-node-p
Predicate that checks whether a value is an AST node by verifying it
is a list and has a :type property. This uses plist access which
bypasses the need for typep or class-based dispatch.
(defun md-node-p (thing)
(and (listp thing) (getf thing :type)))
md-node-text
md-node-text recursively extracts the plain-text representation of a
node tree. The :link type formats as text (url); :text and
:inline-code return their content directly; other container types
concatenate their children's text. This is useful for summarisation
and testing.
(defun md-node-text (node)
(let ((type (getf node :type)))
(cond ((eql type :text) (or (getf node :content) ""))
((eql type :link)
(concatenate 'string
(md-node-text (first (getf node :children)))
(format nil " (~a)" (or (getf node :url) ""))))
((eql type :inline-code) (or (getf node :content) ""))
((getf node :children)
(apply #'concatenate 'string
(mapcar #'md-node-text (getf node :children))))
(t ""))))
Block-level parser
The block parser splits raw text into lines and classifies each line
to determine what kind of block structure it begins. Helper functions
keep the main parse-blocks dispatch manageable.
split-string-into-lines
Handles CRLF, LF, and missing trailing newline uniformly.
Returns a vector for fast indexed access by line number during
parsing. Returns an empty vector for nil input.
(defun split-string-into-lines (string)
(unless string (return-from split-string-into-lines (coerce nil 'vector)))
(let ((result nil) (start 0))
(flet ((add-line (end) (push (subseq string start end) result)))
(loop for i from 0 below (length string)
do (let ((c (char string i)))
(cond ((char= c #\Newline) (add-line i) (setf start (1+ i)))
((and (char= c #\Return) (< (1+ i) (length string))
(char= (char string (1+ i)) #\Newline))
(add-line i) (setf start (+ i 2)) (incf i)))))
(when (< start (length string)) (add-line (length string)))
(coerce (nreverse result) 'vector))))
classify-line
The core line classification function. It checks line prefixes in
priority order — blank lines, thematic breaks, ATX headings, blockquote
markers, unordered/ordered list items, diff headers, diff lines, and
fenced code-block starts — and returns a (cons type data) pair.
Everything else is treated as a paragraph continuation line.
(defun classify-line (line)
(cond
((string= line "") (cons :blank nil))
((and (>= (length line) 3)
(let ((c0 (char line 0)))
(and (find c0 "-*")
(every (lambda (c) (or (char= c c0) (char= c #\Space) (char= c #\Tab)))
line))))
(cons :thematic-break nil))
((and (char= (char line 0) #\#)
(let ((count 0))
(loop for c across line while (char= c #\#) do (incf count))
(and (<= 1 count 6)
(or (>= (length line) (1+ count))
(member (char line count) '(#\Space #\Tab))))))
(let* ((hash-count (loop for c across line while (char= c #\#) count c))
(content (string-trim (list #\Space #\Tab) (subseq line hash-count))))
(cons :heading (cons hash-count content))))
((char= (char line 0) #\>)
(cons :blockquote (string-trim (list #\Space #\Tab) (subseq line 1))))
((and (>= (length line) 2) (find (char line 0) "-*+")
(char= (char line 1) #\Space))
(cons :list-item (string-trim (list #\Space #\Tab) (subseq line 2))))
((and (>= (length line) 3) (digit-char-p (char line 0))
(loop for c across line while (digit-char-p c)
finally (return (find c ". )"))))
(let ((dot-pos (position-if (lambda (c) (find c ". )")) line)))
(if (and dot-pos (find (char line dot-pos) ". )"))
(cons :ordered-item (string-trim (list #\Space #\Tab)
(subseq line (1+ dot-pos))))
(cons :paragraph line))))
((and (>= (length line) 4) (find (char line 0) "-+")
(char= (char line 1) (char line 0))
(char= (char line 2) (char line 0))
(char= (char line 3) #\Space))
(cons :diff-header line))
((and (>= (length line) 1) (find (char line 0) "-+")
(not (and (>= (length line) 3)
(char= (char line 1) (char line 0))
(char= (char line 2) (char line 0)))))
(cons :diff-line (cons (char line 0) (subseq line 1))))
((and (>= (length line) 3) (find (char line 0) "`~")
(let ((fence-len (loop for c across line
while (char= c (char line 0)) count c)))
(and (>= fence-len 3)
(let ((rest (string-trim (list #\Space #\Tab)
(subseq line fence-len))))
(cons :code-start rest))))))
(t (cons :paragraph line))))
find-closing-marker
Scans for a literal marker string starting from position start,
escaping backslash-escaped markers. This is shared by inline
emphasis, code span, and link parsing. Returns the position or nil.
(defun find-closing-marker (text start marker)
(let ((marker-len (length marker)) (len (length text)))
(loop for j from start to (- len marker-len)
do (when (and (char= (char text j) (char marker 0))
(string= marker (subseq text j (+ j marker-len)))
(or (= j 0) (not (char= (char text (1- j)) #\\))))
(return j))
finally (return nil))))
parse-paragraph
Collects consecutive paragraph lines (lines classified as :paragraph)
into a single :paragraph node. Stops at a blank line or any
non-paragraph classification. Lines are joined with spaces before
inline parsing.
(defun parse-paragraph (lines start)
(let ((text-parts nil) (i start))
(loop while (< i (length lines))
do (let* ((raw-line (aref lines i))
(line (string-trim (list #\return) raw-line))
(class (classify-line line)))
(case (car class)
((:paragraph) (push (cdr class) text-parts) (incf i))
(:blank (incf i) (loop-finish))
(t (loop-finish)))))
(values (make-md-node :paragraph :children
(parse-inline
(with-output-to-string (s)
(loop for part in (nreverse text-parts)
for first = t then nil
do (unless first (write-char #\Space s))
(princ part s)))))
i)))
parse-blockquote
Like parse-paragraph but collects :blockquote lines and strips the
leading > marker. The collected text is then inline-parsed to
support bold, italic, code, and links inside quotes.
(defun parse-blockquote (lines start)
(let ((text-parts nil) (i start))
(loop while (< i (length lines))
do (let* ((raw-line (aref lines i))
(line (string-trim (list #\return) raw-line))
(class (classify-line line)))
(case (car class)
(:blockquote (push (cdr class) text-parts) (incf i))
(:blank (incf i) (loop-finish))
(t (loop-finish)))))
(values (make-md-node :blockquote :children
(parse-inline
(with-output-to-string (s)
(loop for part in (nreverse text-parts)
for first = t then nil
do (unless first (write-char #\Space s))
(princ part s)))))
i)))
parse-list
Handles both unordered (:list-item) and ordered (:ordered-item)
list items. Adjacent blank lines between items are allowed (creating
loose lists), but a blank line followed by a non-list line terminates
the list. Returns multiple nodes because each top-level list item
becomes its own :list-item or :ordered-item node.
(defun parse-list (lines start)
(let ((items nil) (i start))
(loop while (< i (length lines))
do (let* ((raw-line (aref lines i))
(line (string-trim (list #\return) raw-line))
(class (classify-line line)))
(case (car class)
((:list-item :ordered-item)
(push (cons (car class) (cdr class)) items) (incf i))
(:blank
(if (and (< (1+ i) (length lines))
(let ((nc (classify-line
(string-trim (list #\return)
(aref lines (1+ i))))))
(member (car nc) '(:list-item :ordered-item))))
(progn (push (cons :blank-sep nil) items) (incf i))
(progn (incf i) (loop-finish))))
(t (loop-finish)))))
(let ((nodes nil))
(dolist (item (nreverse items))
(let ((type (car item)) (content (cdr item)))
(when (and content (not (string= content "")))
(push (make-md-node type :children (parse-inline content)) nodes))))
(values (nreverse nodes) i))))
parse-code-block
Parses a fenced code block starting at start. The fence character
and length are detected from the opening line; the closing fence must
match in character and be at least as long. The language (if any) is
taken from the info string on the opening fence. Produces a single
:code-block node.
(defun parse-code-block (lines start lang)
(let ((code-lines nil)
(i (1+ start))
(fence-char (char (aref lines start) 0))
(fence-len (loop for c across (aref lines start)
while (char= c (char (aref lines start) 0)) count c)))
(loop while (< i (length lines))
do (let* ((raw-line (aref lines i))
(line (string-trim (list #\return) raw-line)))
(when (and (>= (length line) fence-len)
(every (lambda (c) (char= c fence-char))
(subseq line 0 fence-len))
(or (= (length line) fence-len)
(every (lambda (c) (find c " \t"))
(subseq line fence-len))))
(incf i) (loop-finish))
(push line code-lines)
(incf i)))
(values (make-md-node :code-block
:properties (list :language (and lang (not (string= lang "")) lang))
:content
(with-output-to-string (s)
(loop for cl in (nreverse code-lines)
for first = t then nil
do (unless first (terpri s)) (princ cl s))))
i)))
parse-diff-block
Collects consecutive diff lines (:diff-header, :diff-line) into a
single :diff-block node. The raw lines are preserved in a :lines
property for coloured rendering later. Diff blocks are delimited by
blank lines.
(defun parse-diff-block (lines start)
(let ((diff-lines nil) (i start))
(loop while (< i (length lines))
do (let* ((raw-line (aref lines i))
(line (string-trim (list #\return) raw-line))
(class (classify-line line)))
(case (car class)
((:diff-header :diff-line) (push line diff-lines) (incf i))
(:blank (incf i) (loop-finish))
(t (loop-finish)))))
(let ((lines-list (nreverse diff-lines)))
(values (make-md-node :diff-block
:content
(with-output-to-string (s)
(loop for dl in lines-list
for first = t then nil
do (unless first (terpri s)) (princ dl s)))
:properties (list :lines lines-list))
i))))
parse-blocks
Top-level block parser. Dispatches on the classify-line result to
call the appropriate sub-parser, accumulating nodes into a list.
Handles blank lines, thematic breaks, headings, paragraphs,
blockquotes, lists, code blocks, and diff blocks. Returns nil for
nil input.
(defun parse-blocks (text)
(unless text (return-from parse-blocks nil))
(let ((lines (split-string-into-lines text)) (nodes nil) (i 0))
(loop while (< i (length lines))
do (let* ((line (string-trim (list #\return) (aref lines i)))
(classification (classify-line line)))
(case (car classification)
(:blank (incf i))
(:thematic-break (push (make-md-node :thematic-break) nodes) (incf i))
(:paragraph
(multiple-value-bind (node consumed) (parse-paragraph lines i)
(push node nodes) (setf i consumed)))
(:heading
(let* ((level+content (cdr classification))
(level (car level+content))
(content (cdr level+content)))
(push (make-md-node :heading :properties (list :level level)
:children (parse-inline content)) nodes)
(incf i)))
(:blockquote
(multiple-value-bind (node consumed) (parse-blockquote lines i)
(push node nodes) (setf i consumed)))
(:list-item
(multiple-value-bind (node consumed) (parse-list lines i)
(dolist (n node) (push n nodes)) (setf i consumed)))
(:ordered-item
(multiple-value-bind (node consumed) (parse-list lines i)
(dolist (n node) (push n nodes)) (setf i consumed)))
(:code-start
(multiple-value-bind (node consumed)
(parse-code-block lines i (cdr classification))
(push node nodes) (setf i consumed)))
(:diff-header
(multiple-value-bind (node consumed) (parse-diff-block lines i)
(push node nodes) (setf i consumed)))
(t (incf i)))))
(nreverse nodes)))
Inline parser
The inline parser handles character-level formatting inside block content: emphasis, code spans, and links.
parse-inline
Main inline dispatcher. Walks the text character by character.
* triggers star emphasis; _ triggers underscore emphasis; `
triggers inline code; [ triggers links; everything else is
accumulated as plain :text nodes. Consecutive plain text is merged
into single nodes for efficiency.
(defun parse-inline (text)
(unless (and text (> (length text) 0)) (return-from parse-inline nil))
(let ((nodes nil) (i 0) (len (length text)))
(loop while (< i len)
do (let ((c (char text i)))
(case c
(#\*
(multiple-value-bind (node consumed) (parse-star-emphasis text i len)
(if node (progn (push node nodes) (setf i consumed))
(progn (push (make-md-node :text :content (string c)) nodes) (incf i)))))
(#\_
(multiple-value-bind (node consumed) (parse-underscore-emphasis text i len)
(if node (progn (push node nodes) (setf i consumed))
(progn (push (make-md-node :text :content (string c)) nodes) (incf i)))))
(#\`
(multiple-value-bind (node consumed) (parse-inline-code text i len)
(if node (progn (push node nodes) (setf i consumed))
(progn (push (make-md-node :text :content (string c)) nodes) (incf i)))))
(#\[
(multiple-value-bind (node consumed) (parse-link text i len)
(if node (progn (push node nodes) (setf i consumed))
(progn (push (make-md-node :text :content (string c)) nodes) (incf i)))))
(t (let ((start i))
(incf i)
(loop while (< i len)
do (let ((nc (char text i)))
(if (find nc "*_`[") (loop-finish)
(progn
(when (and (< (1+ i) len)
(find nc "*_")
(char= nc (char text (1+ i))))
(loop-finish))
(incf i)))))
(push (make-md-node :text :content (subseq text start i)) nodes))))))
(nreverse nodes)))
parse-star-emphasis
Handles *italic* and **bold** using star markers. A double star
is tried first; if the closing ** is found it produces a :bold
node, otherwise it falls back to single-star :italic. If neither
closes, returns nil to let the caller treat the character as literal
text.
(defun parse-star-emphasis (text i len)
(when (>= i len) (return-from parse-star-emphasis (values nil i)))
(if (and (< (1+ i) len) (char= (char text (1+ i)) #\*))
(let ((close (find-closing-marker text (+ i 2) "**")))
(if close
(values (make-md-node :bold :children (parse-inline (subseq text (+ i 2) close)))
(+ close 2))
(values nil i)))
(let ((close (find-closing-marker text (1+ i) "*")))
(if close
(values (make-md-node :italic :children (parse-inline (subseq text (1+ i) close)))
(1+ close))
(values nil i)))))
parse-underscore-emphasis
Handles _italic_ and __bold__ using underscore markers.
Underscore emphasis is more restrictive than star emphasis: it only
opens after whitespace or at the start of text, and single-underscore
italic only closes before whitespace or punctuation. This avoids false
positives in identifiers like foo_bar.
(defun parse-underscore-emphasis (text i len)
(when (>= i len) (return-from parse-underscore-emphasis (values nil i)))
(when (and (> i 0) (not (find (char text (1- i)) " \t\n\r")))
(return-from parse-underscore-emphasis (values nil i)))
(if (and (< (1+ i) len) (char= (char text (1+ i)) #\_))
(let ((close (find-closing-marker text (+ i 2) "__")))
(if close
(values (make-md-node :bold :children (parse-inline (subseq text (+ i 2) close)))
(+ close 2))
(values nil i)))
(let ((close (find-closing-marker text (1+ i) "_")))
(if (and close
(or (>= (1+ close) len)
(find (char text (1+ close)) " \t\n\r.,;:!?")))
(values (make-md-node :italic :children (parse-inline (subseq text (1+ i) close)))
(1+ close))
(values nil i)))))
parse-inline-code
Parses backtick-delimited inline code spans. Supports up to three backticks as delimiters (so single backticks inside double-backtick spans work). The matched pair's backtick count must be equal.
(defun parse-inline-code (text i len)
(when (or (>= i len) (not (char= (char text i) #\`)))
(return-from parse-inline-code (values nil i)))
(let ((bt-count (loop for j from i below (min len (+ i 3))
while (char= (char text j) #\`) count j)))
(let ((close (find-closing-marker text (+ i bt-count)
(make-string bt-count :initial-element #\`))))
(if close
(values (make-md-node :inline-code
:content (subseq text (+ i bt-count) close))
(+ close bt-count))
(values nil i)))))
parse-link
Parses Markdown links in the form [text](url). Uses nested bracket
matching via find-closing-marker. The text portion is inline-parsed
to support formatting inside link text. Returns nil if the syntax
is incomplete, letting the caller render the [ as literal text.
(defun parse-link (text i len)
(when (or (>= i len) (not (char= (char text i) #\[)))
(return-from parse-link (values nil i)))
(let ((close-bracket (find-closing-marker text (1+ i) "]")))
(unless close-bracket (return-from parse-link (values nil i)))
(when (or (>= (1+ close-bracket) len)
(not (char= (char text (1+ close-bracket)) #\()))
(return-from parse-link (values nil i)))
(let ((close-paren (find-closing-marker text (+ close-bracket 2) ")")))
(unless close-paren (return-from parse-link (values nil i)))
(values (make-md-node :link
:children (parse-inline (subseq text (1+ i) close-bracket))
:url (subseq text (+ close-bracket 2) close-paren))
(1+ close-paren)))))
Syntax highlighting
Syntax highlighting tokenises source code into (token . category) pairs that the renderer colours with ANSI escape codes. Each supported language has a definition of comment, string, keyword, and builtin patterns.
get-highlighter
Returns a plist of highlighting rules for a given language name.
The rules define :comment, :string, :keyword, and :builtin
patterns. Supported languages: lisp, common-lisp, python,
javascript, bash, shell. Unknown languages return nil, which tells
the caller to fall back to plain rendering. The assoc list uses
string= for matching on the language tag, and each entry uses a
dotted-pair format (\"language\" . plist).
(defun get-highlighter (lang)
(cdr (assoc lang
'(("lisp" . (:comment (";" "#|" ";;") :string ("\"")
:keyword ("defun" "defmacro" "defmethod" "defgeneric"
"defvar" "defparameter" "defconstant" "defstruct"
"defclass" "deftype" "define-condition"
"let" "let*" "flet" "labels" "macrolet"
"if" "when" "unless" "cond" "case" "ecase" "typecase"
"loop" "do" "dolist" "dotimes" "tagbody" "go"
"block" "return" "return-from"
"progn" "prog1" "prog2"
"lambda" "function" "quote"
"setf" "setq" "push" "pop" "incf" "decf"
"in-package" "defpackage" "export" "import"
"handler-case" "handler-bind" "ignore-errors"
"multiple-value-bind" "multiple-value-call"
"destructuring-bind"
"declare" "the" "values"
"and" "or" "not" "null"
"car" "cdr" "first" "rest" "second"
"cons" "list" "append" "nconc"
"mapcar" "mapc" "reduce"
"find" "position" "count" "subseq"
"format" "princ" "print" "write" "read"
"load" "compile" "eval"
"make-instance" "slot-value"
"type-of" "class-of")
:builtin ("t" "nil"
"*standard-output*" "*standard-input*"
"*error-output*" "*debug-io*"
"*package*" "*print-circle*")))
("common-lisp" . (:comment (";" "#|" ";;") :string ("\"")
:keyword ("defun" "defmacro" "defmethod" "defgeneric"
"let" "if" "when" "unless" "cond" "case"
"loop" "do" "dolist" "dotimes"
"return" "return-from" "block"
"lambda" "function" "quote"
"setf" "setq" "push" "pop" "incf" "decf"
"handler-case" "handler-bind"
"declare" "the" "values"
"defpackage" "in-package" "export" "import"
"error" "warn" "assert"
"car" "cdr" "first" "rest"
"cons" "list" "append" "mapcar" "reduce"
"format" "princ" "print" "read" "load"
"make-instance")
:builtin ("t" "nil")))
("python" . (:comment ("#") :string ("\"" "'" "\"\"\"" "'''")
:keyword ("def" "class" "return" "yield" "import" "from"
"if" "elif" "else" "for" "while" "in" "not"
"try" "except" "finally" "raise" "with" "pass"
"break" "continue" "lambda" "global"
"assert" "del" "is"
"self" "cls" "async" "await")
:builtin ("None" "True" "False")))
("javascript" . (:comment ("//" "/*") :string ("\"" "'" "`")
:keyword ("function" "class" "const" "let" "var"
"if" "else" "for" "while" "do" "switch"
"return" "break" "continue"
"try" "catch" "finally" "throw"
"new" "this" "super" "delete" "typeof"
"import" "export" "from" "default"
"async" "await" "yield" "of")
:builtin ("true" "false" "null" "undefined" "NaN")))
("bash" . (:comment ("#") :string ("\"" "'")
:keyword ("if" "then" "else" "elif" "fi" "for" "while"
"done" "case" "esac" "in" "function" "return"
"export" "local" "unset" "source"
"echo" "printf" "read" "test" "let" "declare")
:builtin ("true" "false" "cd" "ls" "cat" "grep" "sed"
"mv" "cp" "rm" "mkdir" "touch" "find" "wc"
"head" "tail" "date" "sleep" "kill")))
("shell" . (:comment ("#") :string ("\"" "'")
:keyword ("if" "then" "else" "elif" "fi" "for" "while"
"done" "case" "esac" "in" "function" "return"
"export" "local" "unset" "source"
"echo" "printf" "read" "test")
:builtin ("true" "false" "cd" "ls" "grep" "sed"
"mv" "cp" "rm" "mkdir" "touch" "find"))))
:test #'string=)))
tokenize-line
Tokenises a single line of source code into (token . category)
pairs. Categories are :plain, :comment, :string, :number,
:keyword, :builtin, and :function. The highlighter plist
provides the patterns for comment delimiters, string delimiters,
keywords, and builtins. Words immediately followed by ( are
classified as :function calls.
(defun tokenize-line (line highlighter)
(let ((tokens nil) (i 0) (len (length line))
(comment-chars (getf highlighter :comment))
(string-chars (getf highlighter :string))
(keywords (getf highlighter :keyword))
(builtins (getf highlighter :builtin)))
(loop while (< i len)
do (let ((c (char line i)))
(cond
((find c " \t")
(let ((start i))
(loop while (and (< i len) (find (char line i) " \t")) do (incf i))
(push (cons (subseq line start i) :plain) tokens)))
((and comment-chars
(some (lambda (cc)
(and (<= (+ i (length cc)) len)
(string= cc (subseq line i (+ i (length cc))))))
comment-chars))
(push (cons (subseq line i) :comment) tokens) (setf i len))
((and string-chars (some (lambda (s) (find c s)) string-chars))
(let ((start i))
(incf i)
(let ((triple (and (< i (1- len)) (char= (char line i) c)
(char= (char line (1+ i)) c))))
(if triple
(progn (incf i 2)
(loop while (and (< i len)
(not (and (char= (char line i) c)
(< (1+ i) len)
(char= (char line (1+ i)) c)
(< (+ i 2) len)
(char= (char line (+ i 2)) c))))
do (incf i))
(incf i 3))
(progn (loop while (and (< i len) (char/= (char line i) c))
do (incf i))
(when (< i len) (incf i)))))
(push (cons (subseq line start i) :string) tokens)))
((or (digit-char-p c)
(and (find c "+-") (< (1+ i) len) (digit-char-p (char line (1+ i)))))
(let ((start i))
(loop while (and (< i len) (not (find (char line i) " \t()[]{}'\";:#")))
do (incf i))
(let ((token (subseq line start i)))
(if (digit-char-p (char token 0))
(push (cons token :number) tokens)
(push (cons token :plain) tokens)))))
((or (alpha-char-p c)
(and (find c "-_?!*<>=") (> len 1)))
(let ((start i))
(loop while (and (< i len)
(or (alphanumericp (char line i))
(find (char line i) "-_?!*<>=")))
do (incf i))
(let* ((token (subseq line start i))
(down (string-downcase token)))
(cond
((find down keywords :test #'string=)
(push (cons token :keyword) tokens))
((find down builtins :test #'string=)
(push (cons token :builtin) tokens))
(t (if (and (< i len) (char= (char line i) #\())
(push (cons token :function) tokens)
(push (cons token :plain) tokens)))))))
(t (push (cons (string c) :plain) tokens) (incf i)))))
(nreverse tokens)))
highlight-code
Applies syntax highlighting to a whole code string. Splits the code
into lines, tokenises each line with the language's highlighter, and
returns a flat list of (token . category) pairs with newline
separators between lines. Returns nil for empty input or a single
:plain pair if no highlighter is found for the language.
(defun highlight-code (code language)
(unless code (return-from highlight-code nil))
(let ((highlighter (get-highlighter (and language (string-downcase language)))))
(unless highlighter (return-from highlight-code (list (cons code :plain))))
(let ((tokens nil))
(with-input-from-string (stream code)
(loop for line = (read-line stream nil nil) while line
do (let ((line-tokens (tokenize-line line highlighter)))
(when tokens (push (cons (string #\Newline) :plain) tokens))
(setf tokens (nconc (nreverse line-tokens) tokens)))))
(nreverse tokens))))
apply-highlight-token
Wraps a single token in an ANSI escape code based on its highlight category. Keywords get colour 33 (yellow), builtins 36 (cyan), functions 34 (blue), comments 2 (dim), strings 32 (green), numbers 35 (magenta). Unrecognised categories render as plain text.
(defun apply-highlight-token (token category)
(let ((code (case category
(:keyword "33") (:builtin "36")
(:function "34") (:comment "2") (:string "32") (:number "35")
(t nil))))
(if code (format nil "~c[~am~a~c[0m" #\Esc code token #\Esc) token)))
apply-highlight-style
Coerces an adjustable character vector (accumulated during line rendering) back into a string. This is a thin wrapper that exists for potential future customisation of style application.
(defun apply-highlight-style (char-vector)
(coerce char-vector 'string))
Diff rendering
The diff rendering utilities classify diff lines and produce colourised output.
string-prefix-p
Utility predicate that checks whether string starts with prefix.
Avoids reimplementing this inline in multiple diff classifiers.
(defun string-prefix-p (prefix string)
(and (>= (length string) (length prefix))
(string= prefix (subseq string 0 (length prefix)))))
classify-diff-line
Classifies a single diff line into a semantic category: :file-header
(for +++ and --- lines), :hunk-header (for @@ lines), :added
(for + lines), :removed (for - lines), or :context (for
everything else). This powers colourised diff rendering.
(defun classify-diff-line (line)
(cond ((string-prefix-p "+++ " line) :file-header)
((string-prefix-p "--- " line) :file-header)
((string-prefix-p "@@" line) :hunk-header)
((string-prefix-p "+" line) :added)
((string-prefix-p "-" line) :removed)
(t :context)))
Rendering
The rendering layer converts parsed AST nodes into styled terminal
output strings. Each node type has its own renderer, and
render-md-node dispatches to the correct one.
apply-style
Wraps text in ANSI escape codes for a given style keyword or
string. Supports both keyword (e.g. :bold) and string (e.g.
\"bold\") style designators for flexibility. Common styles include
bold, italic, dim, code, link, underline, and the full set of 16
terminal colours. Unrecognised styles return the text unchanged.
(defun apply-style (style text)
(let ((code (cond
((eql style :bold) "1") ((eql style :italic) "3")
((eql style :dim) "2") ((eql style :code) "0")
((eql style :link) "4;36") ((eql style :url) "4;2")
((eql style :underline) "4") ((eql style :strike) "9")
((eql style :black) "30") ((eql style :red) "31")
((eql style :green) "32") ((eql style :yellow) "33")
((eql style :blue) "34") ((eql style :magenta) "35")
((eql style :cyan) "36") ((eql style :white) "37")
((eql style :bright-black) "90") ((eql style :bright-red) "91")
((eql style :bright-green) "92") ((eql style :bright-yellow) "93")
((eql style :bright-blue) "94") ((eql style :bright-magenta) "95")
((eql style :bright-cyan) "96") ((eql style :bright-white) "97")
((string= style "bold") "1") ((string= style "italic") "3")
((string= style "dim") "2") ((string= style "code") "0")
((string= style "link") "4;36") ((string= style "url") "4;2")
((string= style "bright-cyan") "96")
((string= style "bright-yellow") "93")
((string= style "bright-white") "97")
((string= style "bright-red") "91")
((string= style "bright-green") "92")
((string= style "bright-blue") "94")
((string= style "bright-magenta") "95")
((string= style "cyan") "36") ((string= style "yellow") "33")
((string= style "red") "31") ((string= style "green") "32")
((string= style "blue") "34") ((string= style "magenta") "35")
((string= style "white") "37") ((string= style "black") "30")
(t nil))))
(if code (format nil "~c[~am~a~c[0m" #\Esc code text #\Esc) text)))
render-inline
Renders a list of inline child nodes into a single string. Handles
:text (plain), :bold, :italic, :inline-code, and :link
types. Links render the text styled as link followed by the URL in
parentheses styled as url.
(defun render-inline (children)
(if (null children) ""
(with-output-to-string (s)
(dolist (child children)
(let ((type (getf child :type)))
(case type
(:text (princ (or (getf child :content) "") s))
(:bold (princ (apply-style :bold (render-inline (getf child :children))) s))
(:italic (princ (apply-style :italic (render-inline (getf child :children))) s))
(:inline-code (princ (apply-style :code (or (getf child :content) "")) s))
(:link (let ((text (render-inline (getf child :children)))
(url (or (getf child :url) "")))
(princ (apply-style :link text) s)
(when (and url (not (string= url "")))
(princ " " s)
(princ (apply-style :url (format nil "(~a)" url)) s))))
(t (princ (or (getf child :content) "") s))))))))
render-heading
Renders a heading node as a coloured # Title line. The heading
level determines the number of # characters (capped at 6) and the
colour: level 1 uses bright-cyan, level 2 uses bright-yellow, and
deeper levels use bright-white.
(defun render-heading (node)
(let* ((level (or (getf (getf node :properties) :level) 1))
(prefix (make-string (min level 6) :initial-element #\#))
(text (render-inline (getf node :children)))
(color (cond ((= level 1) :bright-cyan) ((= level 2) :bright-yellow)
(t :bright-white))))
(list (apply-style color (concatenate 'string prefix " " text)))))
render-paragraph
Renders a paragraph node by inline-rendering its children. The result is a single-element list containing the rendered text.
(defun render-paragraph (node)
(list (render-inline (getf node :children))))
render-blockquote
Renders a blockquote node with a dimmed ~> ~ prefix before the inline-rendered content.
(defun render-blockquote (node)
(list (apply-style :dim (concatenate 'string "> " (render-inline (getf node :children))))))
render-code-block
Renders a fenced code block. If the block has a language tag and the
highlighter supports it, the code is syntax-highlighted with ANSI
colours. Otherwise it is rendered in plain :code style. A dimmed
language header line is shown when a language is present.
(defun render-code-block (node)
(let* ((language (or (getf (getf node :properties) :language) ""))
(content (or (getf node :content) ""))
(highlighted (unless (or (null language) (string= language ""))
(highlight-code content language)))
(lines nil))
(when (and language (not (string= language "")))
(push (apply-style :dim (format nil " ~~~~~~ ~a" language)) lines))
(if highlighted
(let ((cl (make-array 0 :element-type 'character
:fill-pointer 0 :adjustable t))
(output nil))
(dolist (pair highlighted)
(let ((token (car pair)) (category (cdr pair)))
(cond ((string= token (string #\Newline))
(push (apply-highlight-style cl) output)
(setf cl (make-array 0 :element-type 'character
:fill-pointer 0 :adjustable t)))
(t (let ((colored (apply-highlight-token token category)))
(loop for ch across colored
do (vector-push-extend ch cl)))))))
(when (> (length cl) 0) (push (apply-highlight-style cl) output))
(setf lines (nconc lines (nreverse output))))
(with-input-from-string (s content)
(loop for line = (read-line s nil nil) while line
do (push (apply-style :code line) lines))))
(nreverse lines)))
render-diff-block
Renders a diff block by classifying each line and applying colour: added lines in green (32), removed in red (31), hunk headers in cyan (36), file headers in bold-cyan (1;36), and context lines unstyled.
(defun render-diff-block (node)
(let* ((lines (getf (getf node :properties) :lines)) (result nil))
(dolist (line (or lines
(and (getf node :content)
(let ((l (split-string-into-lines (getf node :content))))
(loop for i from 0 below (length l) collect (aref l i))))))
(let* ((class (classify-diff-line line))
(color (case class
(:added "32") (:removed "31")
(:hunk-header "36") (:file-header "1;36") (t nil))))
(if color
(push (format nil "~c[~am~a~c[0m" #\Esc color line #\Esc) result)
(push line result))))
(nreverse result)))
render-thematic-break
Renders a thematic break as a dimmed horizontal rule using Unicode box-drawing characters.
(defun render-thematic-break (node)
(declare (ignore node))
(list (apply-style :dim "──────────────────────────────────────────────")))
render-list-item
Renders a list item node. Ordered items get ~ 1.~ prefix, unordered items get ~ * ~ prefix. The content is inline-rendered.
(defun render-list-item (node)
(list (concatenate 'string
(if (eql (getf node :type) :ordered-item) " 1." " * ")
(render-inline (getf node :children)))))
render-md-node
Dispatcher function that routes a single AST node to the correct
renderer based on its :type. Each type-specific renderer returns a
list of strings (multiple lines), which render-md concatenates.
(defun render-md-node (node)
(let ((type (getf node :type)))
(case type
(:heading (render-heading node))
(:paragraph (render-paragraph node))
(:blockquote (render-blockquote node))
(:code-block (render-code-block node))
(:diff-block (render-diff-block node))
(:thematic-break (render-thematic-break node))
(:list-item (render-list-item node))
(:ordered-item (render-list-item node))
(t (list "")))))
render-md
Renders a list of AST nodes (the output of parse-blocks) into a
flat list of output lines by calling render-md-node on each node
and concatenating the results.
(defun render-md (nodes)
(let ((lines nil))
(dolist (node nodes) (setf lines (nconc lines (render-md-node node))))
lines))
render-markdown
Top-level convenience function that parses a Markdown string and
renders it to a single output string with newline-separated lines.
Returns an empty string for nil input.
(defun render-markdown (text)
(unless text (return-from render-markdown ""))
(let ((nodes (parse-blocks text)) (parts nil))
(dolist (line (render-md nodes)) (push line parts))
(with-output-to-string (s)
(loop for part in (nreverse parts)
for first = t then nil
do (unless first (terpri s)) (princ part s)))))
Tests
The test suite covers parser edge cases, heading/paragraph parsing, inline formatting (bold, italic, code, links), code blocks, blockquotes, lists, diff classification, syntax highlighting, render output, and integration.
The first block writes the target file (defpackage/suite). Subsequent blocks append individual test groups.
Package and suite setup
This block must be first because tests/markdown-tests.lisp does not
exist yet — the tangle script creates it by writing this block's content.
All later blocks append.
;;; markdown-tests.lisp — Tests for cl-tty.markdown
(defpackage :cl-tty-markdown-test
(:use :cl :cl-tty.markdown :fiveam))
(in-package :cl-tty-markdown-test)
;; Test suite
(def-suite :cl-tty-markdown-test
:description "Markdown parser/renderer tests for cl-tty.markdown")
(in-suite :cl-tty-markdown-test)
Parser edge cases
Edge cases guard against crashes on nil input, very long lines, blank-only
input, and unclosed fenced blocks. These come first because they exercise the
defensive gate checks at the top of each parsing function.
;; ─── Parser edge cases ─────────────────────────────────────────
(def-test render-markdown-nil ( )
"render-markdown handles nil gracefully."
(is (string= "" (render-markdown nil))))
(def-test render-markdown-empty ( )
"render-markdown handles empty string."
(let ((result (render-markdown "")))
(is (stringp result))
(is (string= "" result))))
(def-test parse-blocks-nil ( )
"parse-blocks handles nil gracefully."
(is-false (parse-blocks nil)))
(def-test split-string-into-lines-nil ( )
"parse-blocks handles nil input (tests internal split-string-into-lines)."
(is-false (parse-blocks nil)))
(def-test nested-bold-inside-italic ( )
"Nested formatting: bold inside italic."
(let ((children (parse-inline "***hello*** world")))
(is (= 3 (length children)))
(let ((first-node (first children)))
(is-true (eql :bold (getf first-node :type))))))
(def-test nested-italic-inside-bold ( )
"Nested formatting: italic inside bold."
(let ((children (parse-inline "**bold *italic* bold**")))
(is (= 1 (length children)))
(let ((bold (first children)))
(is-true (eql :bold (getf bold :type)))
(let ((inner (getf bold :children)))
(is (= 3 (length inner)))
(is-true (eql :italic (getf (second inner) :type)))))))
(def-test inline-code-inside-bold ( )
"Code inside bold."
(let ((children (parse-inline "**bold `code` bold**")))
(is (= 1 (length children)))
(let ((bold (first children)))
(is-true (eql :bold (getf bold :type)))
(let ((inner (getf bold :children)))
(is (= 3 (length inner)))
(is-true (eql :inline-code (getf (second inner) :type)))))))
(def-test unclosed-code-block ( )
"Unclosed code block accumulates remaining lines as content."
(let* ((lines '("```lisp" "(defun foo ())" " (bar)"))
(text (format nil "~{~a~%~}" lines))
(result (parse-blocks text))
(node (first result)))
(is-true (eql :code-block (getf node :type)))
(is (equal "lisp" (getf (getf node :properties) :language)))
(is-true (search "bar" (getf node :content)))))
(def-test code-block-no-language ( )
"Code block with no language is still parsed."
(let* ((lines '("```" "plain" "```"))
(text (format nil "~{~a~%~}" lines))
(result (parse-blocks text))
(node (first result)))
(is-true (eql :code-block (getf node :type)))
(is-false (getf (getf node :properties) :language))))
(def-test markdown-very-long-line ( )
"A very long paragraph line does not cause issues."
(let* ((long-line (make-string 500 :initial-element #\x))
(result (render-markdown long-line)))
(is (stringp result))
(is-true (> (length result) 0))))
(def-test markdown-only-blank ( )
"Only blank lines produce empty output."
(is (string= "" (render-markdown (format nil "~%~%")))))
Heading parsing
ATX headings from level 1 through 6, including headings with inline formatting inside the heading text.
;; ─── Parser tests ─────────────────────────────────────────────────────────────
(def-test heading-parsing ( )
(let* ((result (parse-blocks "# Hello World")) (node (first result)))
(is-true (eql :heading (getf node :type)))
(is (= 1 (getf (getf node :properties) :level)))))
(def-test heading-levels ( )
(loop for level from 1 to 6
do (let* ((hashes (make-string level :initial-element #\#))
(text (format nil "~a Heading ~d" hashes level))
(result (parse-blocks text))
(node (first result)))
(is-true (eql :heading (getf node :type)))
(is (= level (getf (getf node :properties) :level))))))
(def-test heading-with-inline-formatting ( )
(let* ((result (parse-blocks "# Hello **World**"))
(node (first result)) (children (getf node :children)))
(is-true (eql :heading (getf node :type)))
(is (= 2 (length children)))
(is-true (eql :text (getf (first children) :type)))
(is-true (eql :bold (getf (second children) :type)))))
Paragraph parsing
Single-line and multi-line paragraphs. Multi-line paragraphs are joined with spaces before inline parsing.
(def-test paragraph-parsing ( )
(let* ((result (parse-blocks "This is a paragraph.")) (node (first result)))
(is-true (eql :paragraph (getf node :type)))))
(def-test paragraph-multi-line ( )
(let* ((result (parse-blocks "Line one\nLine two")) (node (first result)))
(is-true (eql :paragraph (getf node :type)))))
Inline formatting
Bold, italic, combined bold+italic, inline code, and link parsing. Each test verifies both structure (node types) and content (text/url values).
(def-test bold-parsing ( )
(let* ((children (parse-inline "hello **world** here"))
(bold-node (second children)))
(is (= 3 (length children)))
(is-true (eql :bold (getf bold-node :type)))))
(def-test italic-parsing ( )
(let* ((children (parse-inline "hello *world* here"))
(italic-node (second children)))
(is (= 3 (length children)))
(is-true (eql :italic (getf italic-node :type)))))
(def-test bold-italic-combined ( )
(let ((children (parse-inline "**bold** and *italic*")))
(is (= 3 (length children)))
(is-true (eql :bold (getf (first children) :type)))
(is-true (eql :italic (getf (third children) :type)))))
(def-test inline-code-parsing ( )
(let* ((children (parse-inline "use `foo` here"))
(code-node (second children)))
(is (= 3 (length children)))
(is-true (eql :inline-code (getf code-node :type)))
(is (equal "foo" (getf code-node :content)))))
(def-test link-parsing ( )
(let* ((children (parse-inline "click [here](https://x.com)"))
(link-node (second children)))
(is (= 2 (length children)))
(is-true (eql :link (getf link-node :type)))
(is (equal "https://x.com" (getf link-node :url)))
(let ((link-text (getf link-node :children)))
(is (= 1 (length link-text)))
(is-true (eql :text (getf (first link-text) :type)))
(is (equal "here" (getf (first link-text) :content))))))
Code block parsing
Fenced code blocks with and without a language annotation. Verifies the
presence/absence of the :language property on the resulting node.
(def-test code-block-parsing ( )
(let* ((lines '("```lisp" "(defun hello ())" " (print \"hi\")" "```"))
(text (format nil "~{~a~%~}" lines))
(result (parse-blocks text)) (node (first result)))
(is-true (eql :code-block (getf node :type)))
(is (equal "lisp" (getf (getf node :properties) :language)))
(is-true (search "(defun hello" (getf node :content)))))
(def-test code-block-unknown-language ( )
(let* ((lines '("```" "plain code" "```"))
(text (format nil "~{~a~%~}" lines))
(result (parse-blocks text)) (node (first result)))
(is-true (eql :code-block (getf node :type)))
(is-false (getf (getf node :properties) :language))))
Blockquote, list, and thematic-break parsing
Verifies that blockquote markers, unordered list items, ordered list items, and thematic breaks (—) are correctly classified and produce the expected node types.
(def-test blockquote-parsing ( )
(let* ((result (parse-blocks "> This is a quote")) (node (first result)))
(is-true (eql :blockquote (getf node :type)))))
(def-test list-item-parsing ( )
(let* ((result (parse-blocks "- First item")) (node (first result)))
(is-true (eql :list-item (getf node :type)))))
(def-test ordered-list-parsing ( )
(let* ((result (parse-blocks "1. First item")) (node (first result)))
(is-true (eql :ordered-item (getf node :type)))))
(def-test thematic-break-parsing ( )
(let* ((result (parse-blocks "---")) (node (first result)))
(is-true (eql :thematic-break (getf node :type)))))
Diff line classification
Tests classify-diff-line with each diff line variant: added (+),
removed (-), hunk header (@@), and context (neither).
;; ─── Diff tests ───────────────────────────────────────────────────────────────
(def-test classify-diff-added ( )
(is (eql :added (classify-diff-line "+this is added"))))
(def-test classify-diff-removed ( )
(is (eql :removed (classify-diff-line "-this is removed"))))
(def-test classify-diff-hunk ( )
(is (eql :hunk-header (classify-diff-line "@@ -1,3 +1,4 @@"))))
(def-test classify-diff-context ( )
(is (eql :context (classify-diff-line " normal context"))))
Syntax highlighting
Verifies that highlight-code returns categorised tokens for Lisp
keywords, builtins, comments, and falls back to plain tokens for
unknown languages.
;; ─── Syntax highlighting tests ────────────────────────────────────────────────
(def-test highlight-lisp-keyword ( )
(let ((tokens (highlight-code "(defun hello ()" "lisp")))
(is-true (some (lambda (pair) (and (search "defun" (car pair))
(eql :keyword (cdr pair))))
tokens))))
(def-test highlight-lisp-builtin ( )
"Test that a Lisp builtin like nil is highlighted as :builtin."
(let ((tokens (highlight-code "(if t nil)" "lisp")))
(is-true (some (lambda (pair) (and (string= (car pair) "nil")
(eql :builtin (cdr pair))))
tokens))))
(def-test highlight-unknown-language ( )
(let ((tokens (highlight-code "hello world" "unknown-xyz")))
(every (lambda (pair) (eql :plain (cdr pair))) tokens)))
(def-test highlight-comment ( )
(let ((tokens (highlight-code "; this is a comment" "lisp")))
(is-true (some (lambda (pair) (eql :comment (cdr pair))) tokens))))
Render output
Verifies that each node type produces output via render-md-node.
Heading, paragraph, thematic-break, code-block, and diff-block are
all exercised to ensure the render dispatcher routes correctly.
;; ─── Render tests ─────────────────────────────────────────────────────────────
(def-test render-heading-output ( )
(let* ((node (make-md-node :heading :properties (list :level 2)
:children (list (make-md-node :text :content "Test"))))
(lines (render-md-node node)))
(is (= 1 (length lines)))
(is-true (> (length (first lines)) 0))))
(def-test render-paragraph-output ( )
(let* ((node (make-md-node :paragraph
:children (list (make-md-node :text :content "Hello"))))
(lines (render-md-node node)))
(is (= 1 (length lines)))
(is-true (search "Hello" (first lines)))))
(def-test render-thematic-break-output ( )
(let* ((node (make-md-node :thematic-break)) (lines (render-md-node node)))
(is (= 1 (length lines)))))
(def-test render-code-block-output ( )
(let* ((node (make-md-node :code-block :content "(print \"hello\")"
:properties (list :language "lisp")))
(lines (render-md-node node)))
(is-true (> (length lines) 0))))
(def-test render-diff-block-output ( )
(let* ((node (make-md-node :diff-block :properties
(list :lines
'("--- a/file" "+++ b/file" "@@ -1 +1 @@"
"+added" "-removed" " context"))))
(lines (render-md-node node)))
(is (= 6 (length lines)))
(is (search "added" (fourth lines)))
(is (search "removed" (fifth lines)))))
Integration test and utilities
A full parse-and-render integration test exercises the pipeline end-to-end.
The md-node-text utility tests verify both simple and nested node
traversal.
;; ─── Integration tests ────────────────────────────────────────────────────────
(def-test markdown-integration ( )
(let* ((md (format nil "# Title~%~%This is **bold** and `code`.~%~%- Item 1~%~%- Item 2~%~%> A quote~%~%```lisp~%(defun hello ())~% (print \"hi\")~%```~%~%---"))
(nodes (parse-blocks md)) (lines (render-md nodes)))
(is-true (> (length lines) 5))
(is-true (search "# Title" (first lines)))))
(def-test render-markdown-string ( )
(let ((result (render-markdown "**bold** text")))
(is-true (stringp result))
(is-true (> (length result) 0))))
(def-test md-node-text-simple ( )
(let ((node (make-md-node :text :content "hello")))
(is (equal "hello" (md-node-text node)))))
(def-test md-node-text-nested ( )
(let ((node (make-md-node :paragraph :children
(list (make-md-node :text :content "hello")
(make-md-node :bold :children
(list (make-md-node :text :content "world")))))))
(is (equal "helloworld" (md-node-text node)))))