f76efb0bfb | ||
---|---|---|
tools | ||
vimrepl | ||
LICENSE.txt | ||
Makefile | ||
README.md | ||
build | ||
cgi.f | ||
config.asm | ||
macros.asm | ||
nf.f | ||
nforth.asm | ||
nforth.f | ||
tstamp.f |
README.md
nForth - (Work In Progress)
Summary: a tiny (4KB core) indirect-threaded, experimental, i386 Forth-like language, written in FASM. nForth
is entirely self-contained and requires no outside libraries, or anything else except for a way to read and write.
Target Platform: Currently i386 Linux, for scripting, CGI and farting around.
Status
WIP. Defining and running simple code, loading/compiling secondary files.
Deviations from Forth
nForth is not ANS (or anything) compliant. Major deviations:
Always Compile
To avoid the whole state morass, nForth is always in compile mode. Text is parsed and compiled. To actually execute anything, surround the code with square braces (such code will be erased after execution).
OK> [ " Hello" type cr ]
Sync points
When the dictionary is in a good state, create a checkpoint with sync
immediate word. To revert to the previous checkpoint, use abort
. ..
is a convenience word that executes all code compiled after the previous checkpoint and deletes it, so instead of the previous example you can do:
OK> sync
OK> " Hello" type cr ..
Hello
Hashes, not Names
Symbolic names are not stored in the dictionary. Names are instead hashed at definition time using FNV1a to a 32-bit value and stored in heads. Dictionary searches are simplified to a 32-bit comparison instead of a string comparison, and heads are always 8 bytes long.
Control structures
0BRANCH
may be used to construct conditionals (work in progress)
... if <ifclause> thanx ...
... if <ifclause> else <elseclause> thanx
Error handling
... ERR.CATCH if <error handler> else <protected code> ERR.CLR thanx ...
or
... ERR.CATCH dup if <dispatch on error code...> else drop <protected code ERR.CLR thanx
ERR.CATCH
returns 0 initially and runs protected code, until ERR.CLR
is executed (or an ERXIT
is triggered). In case of an ERXIT
, the code following ERR.CATCH
is reentered with a non-zero (hopefully) error code (ERR.CLR
is implied). Upon ERR.CLR
, previous error handler is restored.
It is up to you to assign error codes that make sense to the handler that follows, and to invoke ERXIT
with a non-zero code in TOS.
Getting Started
The fasm toolchain is in the tools/ directory. Simply run 'make' to create nforth
executable.
File nforth.f
is a Forth file which get embedded into the image (appended after the kernel). Normally it defines the word included;
and uses it to include nf.f
, which contains conditionals, debugging and interactive support. However, you may alter nforth.f to embed anything you want - a cgi script, for instance, making the executable self-contained.
Input defaults to hexidecimal.
The word sys
is useful for debugging. The output looks like:
DSP RSP HERE RUN SRC
09F56FF4 FFB37A24 09F38008 09F38000 09F56003
00000000 00000002 00000001
The three numbers at the bottom are the top three values on the datastack, leftmost being TOS.
dump
may be used to provide a hex-dump of the address in TOS.
Internals
Tokens are 32-bit references to the XT of each words. NEXT is a 3-byte piece of code that terminates all CODE words:
lodsd
jmp [eax]
Register usage
It follows that ESI is used as Instruction Pointer and EAX is a scratch register, trashed by NEXT.
ESP is used as the return stack pointer, while EBP holds the data stack. For data operations, ebp and esp are swapped, and esp temporarily holds the data stack. ESP must be restored prior to control transfers. Special care must be taken at branch targets to assure that the stack is in a known state.
reg | purpose | must preserve |
---|---|---|
eax | inner, trashed | no |
ebx | TOS | |
ecx | scratch | no |
edx | scratch | no |
esi | IP | yes |
edi | -- | yes |
ebp | DSP | yes |
Header format
Heads are always aligned to a 2-byte boundary. Link pointer must be anded with $FFFFFFFE to clear the low bit which is used as a flag.
offset | size | Description |
---|---|---|
-8 | 4 | FLAG - low bit |
LINK - (and 0xFFFFFFFE) pointer | ||
-4 | 4 | HASH - hash of the name |
<------ | Entry Pointer | |
0 | 4 | Code Pointer (DOCOL, etc) |
+4 | ... | definition or data |
Flag bit | Compile-time behavior |
---|---|
0 | Normally compiled word |
1 | IMMediate; execute at compile-time |
Compilation Semantics
Entered text is tokenized at whitespace boundaries, hashed, and the hash looked up. Immediate words are executed; everything else is currently compiled. Each compiled unit is immediately executed to facilitate interactive usage from the prompt.
At compile start, current HERE
is stored in RUN.PTR
variable; at the end of compilation, code at RUN.PTR
is evaluated, and HERE
is restored, erasing the compiled/executed, presumably command-line code. Defining words update RUN.PTR
and HERE
to assure that the definition is preserved. There is no interpret state - we always compile and execute from RUN.PTR
.
Hacking
To define new words in the core:
HEAD <symname>, <codeptrtype>, <opt.flags>
or
HEADN <symname>, <forthname>, <codeptrtype, <opt.flags>
The second form allows for a different symbolic name (the assembler has limitations for things like "+", etc).
may be docol
for procedures, dovar
for variables, $+4
for code primitives, or anything else that makes sense.
<opt.flag>, when 1, marks the word as an immediate.
Words written in assembly (codeptr=$+4) must end with the NEXT macro.