17 KiB
17 KiB
L2 Language Specification (January 2026)
This document reflects the implementation that ships in this repository today (main.py, stdlib, and tests). It replaces the previous aspirational draft with the behavior exercised by the compiler, runtime, and automated samples.
1. Scope and Principles
- Stack-based core – All user code manipulates a 64-bit data stack plus a separate return stack. Every definition is a “word.”
- Ahead-of-time native output –
main.pyemits NASM-compatible x86-64 assembly, assembles it withnasm -f elf64, and links it withld/ld.lldinto an ELF64 executable. There is JIT for the compile time execution and the REPL uses it as well. - Meta-programmable front-end – Parsing, macro expansion, and syntax sugar live in user space via immediate words, text macros, compile-time intrinsics, and
:pyblocks. Users can reshape syntax without touching the Python host. - Unsafe by design – Memory, syscalls, inline assembly, and FFI expose raw machine power. The standard library is intentionally thin and policy-free.
2. Toolchain and Repository Layout
- Driver (
main.py) – Supportspython main.py source.sl -o a.out,--emit-asm,--run,--dbg,--repl,--temp-dir,--clean,--dump-cfg[=path], repeated-I/--includepaths, and repeated-llinker flags (either-lfooor-l libc.so.6). Unknown-lflags are collected and forwarded to the linker. Pass--ct-run-mainto run the program'smainword on the compile-time VM before NASM/ld run, which surfaces discrepancies between compile-time and runtime semantics. Pass--no-artifactto stop after compilation/assembly emission without building an output file, or use--scriptas shorthand for--no-artifact --ct-run-main. Pass--docsto open a searchable TUI that scans stack-effect comments and nearby docs from.slfiles (--docs-querysets initial filter and--docs-rootadds scan roots).--no-foldingdisables constant folding and--no-peepholedisables peephole rewrites (for exampleswap drop→nip,dup dropremoved,swap over→tuck,nip drop→2drop,x 0 +removed,x 1 *removed,x -1 *→neg, andnot notremoved). - REPL –
--repllaunches a stateful session with commands such as:help,:reset,:load,:call <word>,:edit, and:show. - Imports –
import relative/or/absolute/path.slinserts the referenced file textually. Resolution order: (1) absolute path, (2) relative to the importing file, (3) each include path (defaults: project root and./stdlib). Each file is included at most once per compilation unit. Import lines leave blank placeholders so error spans stay meaningful. - Workspace –
stdlib/holds library modules,tests/contains executable samples with.expectedoutputs,extra_tests/houses standalone integration demos, andlibs/collects opt-in extensions such aslibs/fn.slandlibs/nob.sl.
3. Lexical Structure
- Reader – Whitespace-delimited;
#starts a line comment. String literals honor\",\\,\n,\r,\t, and\0. Numbers default to signed 64-bit integers viaint(token, 0)(so0x,0o,0ball work). Tokens containing.oreparse as floats. - Identifiers –
[A-Za-z_][A-Za-z0-9_]*. Everything else is treated as punctuation or literal. - String representation – At runtime each literal pushes
(addr len)with the length on top. The assembler stores literals insection .datawith a trailingNULLfor convenience. - Lists –
[begins a list literal,]ends it. The compiler captures the intervening stack segment into a freshlymmap'd buffer that stores(len followed by qword items), drops the captured values, and pushes the buffer address. Users mustmunmapthe buffer when done. When elems are known at compile time then the list is folded and put in .bss so it doesn't need to be freed then, you can disable this optimization via a flag --no-static-list-folding. - Token customization – Immediate words can call
add-tokenoradd-token-charsto teach the reader about new multi-character tokens.libs/fn.sluses this in combination with token hooks to recognizefoo(1, 2)syntax.
Stack-effect comments
- Location and prefix – Public words in
stdlib/(and most user code should) document its stack effect with a line comment directly above the definition:#word_name …. - Before/after form – Use
[before] -> [after], where each side is a comma-separated list. Items sitting to the left of|are deeper in the stack and on the right is the top most element. Omit the|only when a side is empty ([*]). - Tail sentinel –
*represents the untouched rest of the stack. By convention it is always the first entry on each side so readers can quickly see which values are consumed/produced. - Alternatives – Separate multiple outcomes with
||. Each branch repeats the[before] -> [after]structure (e.g.,#read_file [*, path | len] -> [*, addr | len] || [*, tag | neg_errno]). - Examples –
#dup [* | x] -> [*, x | x]means a word consumes the top valuexand returns two copies with the newest copy at TOS;#arr_pop [* | arr] -> [*, arr | x]states that the array pointer remains just below the popped element. This notation keeps stack order resonably easy to read and grep.
4. Runtime Model
- Stacks –
r12holds the data stack pointer,r13the return stack pointer. Both live in.bssbuffers sized byDSTK_BYTES/RSTK_BYTES(default 64 KiB each).stdlib/core.slimplements all standard stack shuffles, arithmetic, comparisons, boolean ops,@/!,c@/c!, and return-stack transfers (>r,r>,rdrop,rpick). - Calling convention – Calling convention applies only to the extern functions and follows the System V ABI.
externwords marshal arguments into registers beforecall symbol, then push results back onto the data stack. Integer results come fromrax; floating results come fromxmm0and are copied into a qword slot. - Memory helpers –
memreturns the address of thepersistentbuffer (default 64 bytes).argc,argv, andargv@expose process arguments.alloc/freewrapmmap/munmapfor general-purpose buffers, whilememcpyperforms byte-wise copies. - BSS customization – Compile-time words may call
bss-clearfollowed bybss-append/bss-setto replace the default.bsslayout (e.g.,tests/bss_override.slenlargespersistent). - Strings & buffers – IO helpers consume explicit
(addr len)pairs only; there is no implicit NULL contract except for stored literals. - Structured data –
structblocks expand into constants and accessor words (Foo.bar@,Foo.bar!). Dynamic arrays instdlib/arr.slallocate[len, cap, data_ptr, data...]records viammapand exposearr_new,arr_len,arr_cap,arr_data,arr_push,arr_pop,arr_reserve,arr_free.
5. Definitions, Control Flow, and Syntax Sugar
- Word definitions – Always
word name ... end. Redefinitions overwrite the previous entry (a warning prints to stderr).inline word name ... endmarks the definition for inline expansion; recursive inline calls are rejected.immediateandcompile-onlyapply to the most recently defined word. - Priority-based redefinition – Use
priority <int>beforeword,:asm,:py, orexternto control conflicts for the same name. Higher priority wins; lower-priority definitions are ignored. Equal priority keeps last definition (with a redefinition warning). The compiler prints a note indicating which priority was selected. - Control forms – Built-in tokens drive code emission:
- Default parser-level implementations for
if,else,for,while, anddoare always available. - Import
stdlib/control.slto override these defaults with custom compile-time words; when an override is active, the compiler warns and uses the custom implementation. if ... endandif ... else ... end. To express additional branches, placeifon the same line as the precedingelse(e.g.,else <condition> if ...); the reader treats that form as an implicit chained clause, so each inlineifconsumes one flag and jumps past later clauses on success.while <condition> do <body> end; the conditional block lives betweenwhileanddoand re-runs every iteration.n for ... end; the loop count is popped, stored on the return stack, and decremented each pass. The compile-time wordiexposes the loop index inside macros and cannot be used in runtime-emitted words.label name/goto nameperform local jumps within a definition.&namepushes a pointer to wordname(its callable code label). This is intended for indirect control flow;&name jmpperforms a tail jump to that word and is compatible with--ct-run-main.
- Default parser-level implementations for
- Text macros –
macro name [param_count] ... ;records raw tokens until;.$0,$1, ... expand to positional arguments. Macro definitions cannot nest (attempting to start anothermacrowhile recording raises a parse error). - Struct builder –
struct Foo ... endemits<Foo>.size,<Foo>.field.size,<Foo>.field.offset,<Foo>.field@, and<Foo>.field!helpers. Layout is tightly packed with no implicit padding. - With-blocks –
with a b in ... endrewrites occurrences ofa/binto accesses against hidden global cells (__with_a). On entry the block pops the named values and stores them in those cells; reads compile to@, writes to!. Because the cells live in.data, the slots persist across calls and are not re-entrant. - List literals –
[ values ... ]capture the current stack slice, allocate storage (mmap), copy the elements, and push the pointer. The record storeslenat offset 0 and items afterwards so user code can fetch length via@and iterate. - Compile-time execution –
compile-time foorunsfooimmediately but still emits it (if inside a definition). Immediate words always execute during parsing; ordinary words emitwordops for later code generation.
6. Compile-Time Facilities
- Virtual machine – Immediate words run inside
CompileTimeVM, which keeps its own stacks and exposes helpers registered inbootstrap_dictionary():- Lists/maps:
list-new,list-append,list-pop,list-pop-front,list-length,list-empty?,list-get,list-set,list-extend,list-last,map-new,map-set,map-get,map-has?. - Strings/numbers:
string=,string-length,string-append,string>number,int>string. - Lexer utilities:
lexer-new,lexer-pop,lexer-peek,lexer-expect,lexer-collect-brace,lexer-push-back(used bylibs/fn.slto parse signatures and infix expressions). - Token management:
next-token,peek-token,inject-tokens,token-lexeme,token-from-lexeme. - Control-frame helpers:
ct-control-frame-new,ct-control-get,ct-control-set,ct-control-push,ct-control-pop,ct-control-peek,ct-control-depth,ct-control-add-close-op,ct-new-label,ct-emit-op,ct-last-token-line. - Control registration:
ct-register-block-opener,ct-unregister-block-opener,ct-register-control-override,ct-unregister-control-override. - Reader hooks:
set-token-hookinstalls a word that receives each token (pushed as aTokenobject) and must leave a truthy handled flag;clear-token-hookdisables it.libs/fn.sl'sextend-syntaxdemonstrates rewritingfoo(1, 2)into ordinary word calls. - Prelude/BSS control:
prelude-clear,prelude-append,prelude-set,bss-clear,bss-append,bss-setlet user code override the_startstub or.bsslayout. - Definition helpers:
emit-definitioninjects aword ... enddefinition on the fly (used by the struct macro).parse-errorraises a custom diagnostic. - Assertions:
static_assertis a compile-time-only primitive that pops a condition and raisesParseError("static assertion failed at <path>:<line>:<column>")when the value is zero/false.
- Lists/maps:
- Text macros –
macrois an immediate word implemented in Python; it prevents nesting by tracking active recordings and registers expansion tokens with$nsubstitution. - Python bridges –
:py name { ... } ;executes once during parsing. The body may definemacro(ctx: MacroContext)(with helpers such asnext_token,emit_literal,inject_tokens,new_label, and directparseraccess) and/orintrinsic(builder: FunctionEmitter)to emit assembly directly. ThefnDSL (libs/fn.sl) and other syntax layers are ordinary:pyblocks.
7. Foreign Code, Inline Assembly, and Syscalls
:asm name { ... } ;– Defines a word entirely in NASM syntax. The body is copied verbatim into the output and terminated withret. Ifkeystone-engineis installed,:asmwords also execute at compile time; the VM marshals(addr len)string pairs by scanning fordata_start/data_endreferences.:pyintrinsics – As above,intrinsic(builder)can emit custom assembly without going through the normal AST.extern– Two forms:- Raw:
extern foo 2 1marksfooas taking two stack arguments and returning one value. The emitter simply emitscall foo. - C-style:
extern double atan2(double y, double x)parses the signature, loads integer arguments intordi..r9, floating arguments intoxmm0..xmm7, alignsrsp, setsalto the number of SSE arguments, and pushes the result fromxmm0orrax. Only System V register slots are supported.
- Raw:
- Syscalls – The built-in word
syscallexpects(argN ... arg0 count nr -- ret). It clamps the count to[0, 6], loads arguments intordi,rsi,rdx,r10,r8,r9, executessyscall, and pushesrax.stdlib/linux.slauto-generates macros of the formsyscall.write→3 1plus.num/.argchelpers, and provides assembly-onlysyscall1–syscall6macros so the module works without the rest of the stdlib.tests/syscall_write.sldemonstrates the intended usage.
8. Standard Library Overview (stdlib/)
core.sl– Stack shuffles, integer arithmetic, comparisons, boolean ops, memory access, syscall stubs (mmap,munmap,exit), argument helpers (argc,argv,argv@), and pointer helpers (mem).control.sl– Optional custom control-structure words (if,else,for,while,do) that can override parser defaults when imported.mem.sl–alloc/freewrappers aroundmmap/munmapplus a byte-wisememcpyused by higher-level utilities.io.sl–read_file,write_file,read_stdin,write_buf,ewrite_buf,putc,puti,puts,eputs.utils.sl– String and number helpers (strcmp,strconcat,strlen,digitsN>num,toint,count_digits,tostr).arr.sl– Dynamically sized qword arrays witharr_new,arr_len,arr_cap,arr_data,arr_push,arr_pop,arr_reserve,arr_free; built-in static-array sorting viaarr_sort/arr_sorted; and dynamic-array sorting viadyn_arr_sort/dyn_arr_sorted.float.sl– SSE-based double-precision arithmetic (f+,f-,f*,f/,fneg, comparisons,int>float,float>int,fput,fputln).linux.sl– Auto-generated syscall macros (one constant block per entry insyscall_64.tbl) plus thesyscallNhelpers implemented purely in assembly so the file can be used in isolation.debug.sl– Diagnostics and checks such asdump,rdump,int3, runtimeassert(printsassertion failedand exits with code 1),assert_msg(message + condition; exits with message when false),abort(printsabortand exits with code 1), andabort_msg(prints caller-provided message and exits with code 1).stdlib.sl– Convenience aggregator that importscore,mem,io, andutilsso most programs can simplyimport stdlib/stdlib.sl.
9. Testing and Usage Patterns
- Automated coverage –
python test.pycompiles everytests/*.sl, runs the generated binary, and compares stdout against<name>.expected. Optional companions include<name>.stdin(piped to the process),<name>.args(extra CLI args parsed withshlex),<name>.stderr(expected stderr), and<name>.meta.json(per-test knobs such asexpected_exit,expect_compile_error, orenv). Theextra_tests/folder ships with curated demos (extra_tests/ct_test.sl,extra_tests/args.sl,extra_tests/c_extern.sl,extra_tests/fn_test.sl,extra_tests/nob_test.sl) that run alongside the core suite; pass--extra path/to/foo.slto cover more standalone files. Usepython test.py --listto see descriptions andpython test.py --update footo bless outputs after intentional changes. Add--ct-run-mainwhen invoking the harness to run each test'smainat compile time as well; capture that stream with<name>.compile.expectedif you want automated comparisons. - Common commands –
python test.py(run the whole suite)python test.py hello --update(re-bless a single test)python test.py --ct-run-main hello(compile/run a single test while also exercisingmainon the compile-time VM)python main.py tests/hello.sl -o build/hello && ./build/hellopython main.py program.sl --emit-asm --temp-dir buildpython main.py --repl