nforth is a minimal, indirect-threaded, experimental forth with a tiny footprint (4KB).
Go to file
StackSmith f76efb0bfb parsing environment variables 2023-10-01 22:10:04 -04:00
tools checked in the toolchain 2023-08-31 19:56:26 -04:00
vimrepl vimrepl 2023-09-16 14:40:12 -04:00
LICENSE.txt LICENSE.txt and config.asm 2023-09-04 13:39:38 -04:00
Makefile Alignment is now 2; low bit is still IMMEDIATE 2023-09-08 11:25:09 -04:00
README.md Mentioned sync and abort in README 2023-09-14 19:31:05 -04:00
build checked in the toolchain 2023-08-31 19:56:26 -04:00
cgi.f parsing environment variables 2023-10-01 22:10:04 -04:00
config.asm separated endless interpret lop into INTERP; both OUTER and LOADH now use it (LOADH failed) 2023-10-01 19:17:40 -04:00
macros.asm BROKEN! trying to change alignment to 2: 2023-09-08 10:16:31 -04:00
nf.f parsing environment variables 2023-10-01 22:10:04 -04:00
nforth.asm parsing environment variables 2023-10-01 22:10:04 -04:00
nforth.f separated endless interpret lop into INTERP; both OUTER and LOADH now use it (LOADH failed) 2023-10-01 19:17:40 -04:00
tstamp.f cr before printing error line on error to match error position 2023-09-19 11:04:42 -04:00

README.md

nForth - (Work In Progress)

Summary: a tiny (4KB core) indirect-threaded, experimental, i386 Forth-like language, written in FASM. nForth is entirely self-contained and requires no outside libraries, or anything else except for a way to read and write.

Target Platform: Currently i386 Linux, for scripting, CGI and farting around.

Status

WIP. Defining and running simple code, loading/compiling secondary files.

Deviations from Forth

nForth is not ANS (or anything) compliant. Major deviations:

Always Compile

To avoid the whole state morass, nForth is always in compile mode. Text is parsed and compiled. To actually execute anything, surround the code with square braces (such code will be erased after execution).

OK> [ " Hello" type cr ]

Sync points

When the dictionary is in a good state, create a checkpoint with sync immediate word. To revert to the previous checkpoint, use abort. .. is a convenience word that executes all code compiled after the previous checkpoint and deletes it, so instead of the previous example you can do:

OK> sync
OK> " Hello" type cr ..
Hello

Hashes, not Names

Symbolic names are not stored in the dictionary. Names are instead hashed at definition time using FNV1a to a 32-bit value and stored in heads. Dictionary searches are simplified to a 32-bit comparison instead of a string comparison, and heads are always 8 bytes long.

Control structures

0BRANCH may be used to construct conditionals (work in progress)

... if <ifclause> thanx ...
... if <ifclause> else <elseclause> thanx

Error handling

... ERR.CATCH if <error handler> else <protected code> ERR.CLR thanx ...
  or
... ERR.CATCH dup if <dispatch on error code...> else drop <protected code ERR.CLR thanx

ERR.CATCH returns 0 initially and runs protected code, until ERR.CLR is executed (or an ERXIT is triggered). In case of an ERXIT, the code following ERR.CATCH is reentered with a non-zero (hopefully) error code (ERR.CLR is implied). Upon ERR.CLR, previous error handler is restored.

It is up to you to assign error codes that make sense to the handler that follows, and to invoke ERXIT with a non-zero code in TOS.

Getting Started

The fasm toolchain is in the tools/ directory. Simply run 'make' to create nforth executable.

File nforth.f is a Forth file which get embedded into the image (appended after the kernel). Normally it defines the word included; and uses it to include nf.f, which contains conditionals, debugging and interactive support. However, you may alter nforth.f to embed anything you want - a cgi script, for instance, making the executable self-contained.

Input defaults to hexidecimal.

The word sys is useful for debugging. The output looks like:

DSP      RSP      HERE     RUN      SRC
09F56FF4 FFB37A24 09F38008 09F38000 09F56003
00000000 00000002 00000001

The three numbers at the bottom are the top three values on the datastack, leftmost being TOS.

dump may be used to provide a hex-dump of the address in TOS.

Internals

Tokens are 32-bit references to the XT of each words. NEXT is a 3-byte piece of code that terminates all CODE words:

lodsd
jmp [eax]

Register usage

It follows that ESI is used as Instruction Pointer and EAX is a scratch register, trashed by NEXT.

ESP is used as the return stack pointer, while EBP holds the data stack. For data operations, ebp and esp are swapped, and esp temporarily holds the data stack. ESP must be restored prior to control transfers. Special care must be taken at branch targets to assure that the stack is in a known state.

reg purpose must preserve
eax inner, trashed no
ebx TOS
ecx scratch no
edx scratch no
esi IP yes
edi -- yes
ebp DSP yes

Header format

Heads are always aligned to a 2-byte boundary. Link pointer must be anded with $FFFFFFFE to clear the low bit which is used as a flag.

offset size Description
-8 4 FLAG - low bit
LINK - (and 0xFFFFFFFE) pointer
-4 4 HASH - hash of the name
<------ Entry Pointer
0 4 Code Pointer (DOCOL, etc)
+4 ... definition or data
Flag bit Compile-time behavior
0 Normally compiled word
1 IMMediate; execute at compile-time

Compilation Semantics

Entered text is tokenized at whitespace boundaries, hashed, and the hash looked up. Immediate words are executed; everything else is currently compiled. Each compiled unit is immediately executed to facilitate interactive usage from the prompt.

At compile start, current HERE is stored in RUN.PTR variable; at the end of compilation, code at RUN.PTR is evaluated, and HERE is restored, erasing the compiled/executed, presumably command-line code. Defining words update RUN.PTR and HERE to assure that the definition is preserved. There is no interpret state - we always compile and execute from RUN.PTR.

Hacking

To define new words in the core:

HEAD <symname>, <codeptrtype>, <opt.flags>
  or
HEADN <symname>, <forthname>, <codeptrtype, <opt.flags>

The second form allows for a different symbolic name (the assembler has limitations for things like "+", etc). may be docol for procedures, dovar for variables, $+4 for code primitives, or anything else that makes sense. <opt.flag>, when 1, marks the word as an immediate.

Words written in assembly (codeptr=$+4) must end with the NEXT macro.