6618 - new docs
This commit is contained in:
parent
3ccb2c8328
commit
9a524793ee
798
README.md
798
README.md
|
@ -10,7 +10,7 @@ Running the code you want to run, and nothing else.
|
||||||
```sh
|
```sh
|
||||||
$ git clone https://github.com/akkartik/mu
|
$ git clone https://github.com/akkartik/mu
|
||||||
$ cd mu
|
$ cd mu
|
||||||
$ ./translate_mu apps/ex2.mu # emits a.elf
|
$ ./translate_mu apps/ex2.mu # emit a.elf
|
||||||
$ ./a.elf # adds 3 and 4
|
$ ./a.elf # adds 3 and 4
|
||||||
$ echo $?
|
$ echo $?
|
||||||
7
|
7
|
||||||
|
@ -18,10 +18,13 @@ Running the code you want to run, and nothing else.
|
||||||
|
|
||||||
[![Build Status](https://api.travis-ci.org/akkartik/mu.svg?branch=master)](https://travis-ci.org/akkartik/mu)
|
[![Build Status](https://api.travis-ci.org/akkartik/mu.svg?branch=master)](https://travis-ci.org/akkartik/mu)
|
||||||
|
|
||||||
Mu requires just a Unix-like OS and nothing else. The Mu translator is built
|
Rather than start from some syntax and introduce layers of translation to
|
||||||
up from machine code. You can also bootstrap it from C++. Both C++ and
|
implement it, Mu starts from the processor's instruction set and tries to get
|
||||||
self-hosted versions emit identical binaries. The generated binaries require
|
to _some_ safe and clear syntax with as few layers of translation as possible.
|
||||||
just a Unix-like kernel and nothing else. (There are more details in [this paper](http://akkartik.name/akkartik-convivial-20200607.pdf).)
|
The emphasis is on internal consistency at any point in time rather than
|
||||||
|
compatibility with the past. ([More details.](http://akkartik.name/akkartik-convivial-20200607.pdf))
|
||||||
|
|
||||||
|
Currently Mu requires a 32-bit x86 processor and Linux kernel.
|
||||||
|
|
||||||
## Goals
|
## Goals
|
||||||
|
|
||||||
|
@ -56,668 +59,84 @@ In priority order:
|
||||||
For now it's a thin veneer over machine code. I'm working on memory safety
|
For now it's a thin veneer over machine code. I'm working on memory safety
|
||||||
before expressive syntax.
|
before expressive syntax.
|
||||||
|
|
||||||
## Source Language
|
## Toolchain
|
||||||
|
|
||||||
Mu's main source language is [still being built](http://akkartik.name/post/mu-2019-2).
|
The Mu stack consists of:
|
||||||
When completed, it will be type- and memory-safe. At the moment it performs no
|
- the Mu type-safe language;
|
||||||
checks. Here's the program we translated above:
|
- SubX, an unsafe notation for a subset of x86 machine code; and
|
||||||
|
- _bare_ SubX, a more rudimentary form of SubX without certain syntax sugar.
|
||||||
|
|
||||||
<img alt='ex2.mu' src='html/ex2.mu.png'>
|
All Mu programs get translated through these layers, and the translators are
|
||||||
|
mostly built out of lower levels. The translator from Mu to SubX is written in
|
||||||
|
SubX, and the translator from SubX to bare SubX is built in bare SubX.
|
||||||
|
|
||||||
There are no expressions; functions consist of only statements that operate on
|
To translate Mu programs through these layers into tiny zero-dependency ELF
|
||||||
variables. Most statements in Mu translate to [a single machine-code instruction](http://akkartik.github.io/mu/html/mu_instructions.html).
|
binaries for Linux, use `translate_mu`.
|
||||||
Variables reside in memory by default. Programs must specify registers when
|
|
||||||
they want to use them. Functions must return results in registers. Execution
|
|
||||||
begins at the function `main`, which always returns its result in register
|
|
||||||
`ebx`. [The paper](http://akkartik.name/akkartik-convivial-20200607.pdf) has
|
|
||||||
more details, and there's a [summary](mu_summary) of all supported instructions.
|
|
||||||
|
|
||||||
## SubX
|
Mu programs can be run in emulated mode to emit traces, which permit time-travel
|
||||||
|
debugging. ([More details.](debugging.md))
|
||||||
|
|
||||||
Mu is written in [a notation for a subset of x86 machine code called SubX](http://akkartik.name/post/mu-2019-1).
|
### incomplete tools
|
||||||
Here's a SubX program (`apps/ex1.subx`) that returns 42:
|
|
||||||
|
The Mu translator is still a work in progress; not all incorrect programs
|
||||||
```sh
|
result in good error messages.
|
||||||
bb/copy-to-ebx 0x2a/imm32 # 42 in hex
|
|
||||||
b8/copy-to-eax 1/imm32/exit
|
Once generated, ELF binaries can be packaged up with a Linux kernel into a
|
||||||
cd/syscall 0x80/imm8
|
bootable disk image:
|
||||||
```
|
|
||||||
|
|
||||||
You can generate tiny zero-dependency ELF binaries from SubX that run on Linux.
|
|
||||||
|
|
||||||
```sh
|
|
||||||
$ ./bootstrap translate init.linux apps/ex1.subx -o apps/ex1 # on Linux or BSD or Mac
|
|
||||||
$ ./apps/ex1 # only on Linux
|
|
||||||
$ echo $?
|
|
||||||
42
|
|
||||||
```
|
|
||||||
|
|
||||||
(Running `bootstrap` requires a C++ compiler, transparently invoking it as
|
|
||||||
necessary.)
|
|
||||||
|
|
||||||
You can run the generated binaries on an interpreter/VM for better error
|
|
||||||
messages.
|
|
||||||
|
|
||||||
```sh
|
|
||||||
$ ./bootstrap run apps/ex1 # on Linux or BSD or Mac
|
|
||||||
$ echo $?
|
|
||||||
42
|
|
||||||
```
|
|
||||||
|
|
||||||
Emulated runs can generate a trace that permits [time-travel debugging](https://github.com/akkartik/mu/blob/master/tools/browse_trace.readme.md).
|
|
||||||
|
|
||||||
```sh
|
|
||||||
$ ./bootstrap --debug translate init.linux apps/factorial.subx -o apps/factorial
|
|
||||||
saving address->label information to 'labels'
|
|
||||||
saving address->source information to 'source_lines'
|
|
||||||
|
|
||||||
$ ./bootstrap --trace run apps/factorial
|
|
||||||
saving trace to 'last_run'
|
|
||||||
|
|
||||||
$ tools/browse_trace last_run # text-mode debugger UI
|
|
||||||
```
|
|
||||||
|
|
||||||
You can write tests for your programs. The entire stack is thoroughly covered
|
|
||||||
by automated tests. SubX's tagline: tests before syntax.
|
|
||||||
|
|
||||||
```sh
|
|
||||||
$ ./bootstrap test
|
|
||||||
$ ./bootstrap run apps/factorial test
|
|
||||||
```
|
|
||||||
|
|
||||||
You can use SubX to translate itself. For example, running natively on Linux:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
# generate translator phases using the C++ translator
|
|
||||||
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/hex.subx -o hex
|
|
||||||
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/survey.subx -o survey
|
|
||||||
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/pack.subx -o pack
|
|
||||||
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/assort.subx -o assort
|
|
||||||
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/dquotes.subx -o dquotes
|
|
||||||
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/tests.subx -o tests
|
|
||||||
$ chmod +x hex survey pack assort dquotes tests
|
|
||||||
|
|
||||||
# use the generated translator phases to translate SubX programs
|
|
||||||
$ cat init.linux apps/ex1.subx |./tests |./dquotes |./assort |./pack |./survey |./hex > a.elf
|
|
||||||
$ chmod +x a.elf
|
|
||||||
$ ./a.elf
|
|
||||||
$ echo $?
|
|
||||||
42
|
|
||||||
|
|
||||||
# or, automating the above steps
|
|
||||||
$ ./translate_subx init.linux apps/ex1.subx
|
|
||||||
$ ./a.elf
|
|
||||||
$ echo $?
|
|
||||||
42
|
|
||||||
```
|
|
||||||
|
|
||||||
Or, running in a VM on other platforms (much slower):
|
|
||||||
|
|
||||||
```sh
|
|
||||||
$ ./translate_subx_emulated init.linux apps/ex1.subx # generates identical a.elf to above
|
|
||||||
$ ./bootstrap run a.elf
|
|
||||||
$ echo $?
|
|
||||||
42
|
|
||||||
```
|
|
||||||
|
|
||||||
You can package up SubX binaries with the minimal hobbyist OS [Soso](https://github.com/ozkl/soso)
|
|
||||||
and run them on Qemu. (Requires graphics and sudo access. Currently doesn't
|
|
||||||
work on a cloud server.)
|
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
|
$ ./translate_mu apps/ex2.mu # emit a.elf
|
||||||
# dependencies
|
# dependencies
|
||||||
$ sudo apt install build-essential util-linux nasm xorriso # maybe also dosfstools and mtools
|
|
||||||
# package up a "hello world" program with a third-party kernel into mu_soso.iso
|
|
||||||
# requires sudo
|
|
||||||
$ tools/iso/soso init.soso apps/ex6.subx
|
|
||||||
# try it out
|
|
||||||
$ qemu-system-i386 -cdrom mu_soso.iso
|
|
||||||
```
|
|
||||||
|
|
||||||
You can also package up SubX binaries with a Linux kernel and run them on
|
|
||||||
either Qemu or [a cloud server that supports custom images](http://akkartik.name/post/iso-on-linode).
|
|
||||||
(Takes 12 minutes with 2GB RAM. Requires 12 million LoC of C for the Linux
|
|
||||||
kernel; that number will gradually go down.)
|
|
||||||
|
|
||||||
```sh
|
|
||||||
$ sudo apt install build-essential flex bison wget libelf-dev libssl-dev xorriso
|
$ sudo apt install build-essential flex bison wget libelf-dev libssl-dev xorriso
|
||||||
$ tools/iso/linux init.linux apps/ex6.subx
|
$ tools/iso/linux a.elf
|
||||||
$ qemu-system-x86_64 -m 256M -cdrom mu_linux.iso -boot d
|
$ qemu-system-x86_64 -m 256M -cdrom mu_linux.iso -boot d
|
||||||
```
|
```
|
||||||
|
|
||||||
## The syntax of SubX instructions
|
The disk image also runs on [any cloud server that supports custom images](http://akkartik.name/post/iso-on-linode).
|
||||||
|
|
||||||
Here is the above SubX example again:
|
Mu also runs on the minimal hobbyist OS [Soso](https://github.com/ozkl/soso).
|
||||||
|
(Requires graphics and sudo access. Currently doesn't work on a cloud server.)
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
bb/copy-to-ebx 0x2a/imm32 # 42 in hex
|
$ ./translate_mu apps/ex2.mu # emit a.elf
|
||||||
b8/copy-to-eax 1/imm32/exit
|
# dependencies
|
||||||
cd/syscall 0x80/imm8
|
$ sudo apt install build-essential util-linux nasm xorriso # maybe also dosfstools and mtools
|
||||||
|
$ tools/iso/soso a.elf # requires sudo
|
||||||
|
$ qemu-system-i386 -cdrom mu_soso.iso
|
||||||
```
|
```
|
||||||
|
|
||||||
Every line contains at most one instruction. Instructions consist of words
|
## Syntax
|
||||||
separated by whitespace. Words may be _opcodes_ (defining the operation being
|
|
||||||
performed) or _arguments_ (specifying the data the operation acts on). Any
|
|
||||||
word can have extra _metadata_ attached to it after `/`. Some metadata is
|
|
||||||
required (like the `/imm32` and `/imm8` above), but unrecognized metadata is
|
|
||||||
silently skipped so you can attach comments to words (like the instruction
|
|
||||||
name `/copy-to-eax` above, or the `/exit` argument).
|
|
||||||
|
|
||||||
What do all these numbers mean? SubX supports a small subset of the 32-bit x86
|
The entire stack shares certain properties and conventions. Programs consist
|
||||||
instruction set that likely runs on your computer. (Think of the name as short
|
of functions and functions consist of statements, each performing a single
|
||||||
for "sub-x86".) The instruction set contains instructions like `89/copy`,
|
operation. Operands to statements are always variables or constants. You can't
|
||||||
`01/add`, `3d/compare` and `51/push-ecx` which modify registers and a byte-addressable
|
say `a + b*c`, you have to break it up into two operations. Variables can live
|
||||||
memory. For a complete list of supported instructions, run `bootstrap help opcodes`.
|
in memory or in registers. Registers must be explicitly specified. There are
|
||||||
|
some shared lexical rules; comments always start with '#', and numbers are
|
||||||
|
always written in hex.
|
||||||
|
|
||||||
The registers instructions operate on are as follows:
|
Here's an example program in Mu:
|
||||||
|
|
||||||
- Six general-purpose 32-bit registers: `0/eax`, `1/ebx`, `2/ecx`, `3/edx`,
|
<img alt='ex2.mu' src='html/ex2.mu.png'>
|
||||||
`6/esi` and `7/edi`.
|
|
||||||
- Two additional 32-bit registers: `4/esp` and `5/ebp`. (I suggest you only
|
|
||||||
use these to manage the call stack.)
|
|
||||||
|
|
||||||
(SubX doesn't support floating-point registers yet. Intel processors support
|
[More details on Mu syntax →](mu.md)
|
||||||
an 8-bit mode, 16-bit mode and 64-bit mode. SubX will never support them.
|
|
||||||
There are also _many_ more instructions that SubX will never support.)
|
|
||||||
|
|
||||||
While SubX doesn't provide the usual mnemonics for opcodes, it _does_ provide
|
Here's an example program in SubX:
|
||||||
error-checking. If you miss an argument or accidentally add an extra argument,
|
|
||||||
you'll get a nice error. SubX won't arbitrarily interpret bytes of data as
|
|
||||||
instructions or vice versa.
|
|
||||||
|
|
||||||
It's worth distinguishing between an instruction's arguments and its _operands_.
|
```sh
|
||||||
Arguments are provided directly in instructions. Operands are pieces of data
|
== code
|
||||||
in register or memory that are operated on by instructions.
|
Entry:
|
||||||
|
# ebx = 1
|
||||||
Intel processors typically operate on no more than two operands, and at most
|
bb/copy-to-ebx 1/imm32
|
||||||
one of them (the 'reg/mem' operand) can access memory. The address of the
|
# increment ebx
|
||||||
reg/mem operand is constructed by expressions of one of these forms:
|
43/increment-ebx
|
||||||
|
# exit(ebx)
|
||||||
- `%reg`: operate on just a register, not memory
|
e8/call syscall_exit/disp32
|
||||||
- `*reg`: look up memory with the address in some register
|
|
||||||
- `*(reg + disp)`: add a constant to the address in some register
|
|
||||||
- `*(base + (index << scale) + disp)` where `base` and `index` are registers,
|
|
||||||
and `scale` and `disp` are 2- and 32-bit constants respectively.
|
|
||||||
|
|
||||||
Under the hood, SubX turns expressions of these forms into multiple arguments
|
|
||||||
with metadata in some complex ways. See [subx\_addressing\_modes.md](subx_addressing_modes.md).
|
|
||||||
|
|
||||||
That covers the complexities of the reg/mem operand. The second operand is
|
|
||||||
simpler. It comes from exactly one of the following argument types:
|
|
||||||
|
|
||||||
- `/r32`
|
|
||||||
- displacement: `/disp8` or `/disp32`
|
|
||||||
- immediate: `/imm8` or `/imm32`
|
|
||||||
|
|
||||||
Putting all this together, here's an example that adds the integer in `eax` to
|
|
||||||
the one at address `edx`:
|
|
||||||
|
|
||||||
```
|
|
||||||
01/add %edx 0/r32/eax
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## The syntax of SubX programs
|
[More details on SubX syntax →](subx.md)
|
||||||
|
|
||||||
SubX programs map to the same ELF binaries that a conventional Linux system
|
## Forks
|
||||||
uses. Linux ELF binaries consist of a series of _segments_. In particular, they
|
|
||||||
distinguish between code and data. Correspondingly, SubX programs consist of a
|
|
||||||
series of segments, each starting with a header line: `==` followed by a name
|
|
||||||
and approximate starting address.
|
|
||||||
|
|
||||||
All code must lie in a segment called 'code'.
|
|
||||||
|
|
||||||
Segments can be added to.
|
|
||||||
|
|
||||||
```sh
|
|
||||||
== code 0x09000000 # first mention requires starting address
|
|
||||||
...A...
|
|
||||||
|
|
||||||
== data 0x0a000000
|
|
||||||
...B...
|
|
||||||
|
|
||||||
== code # no address necessary when adding
|
|
||||||
...C...
|
|
||||||
```
|
|
||||||
|
|
||||||
The `code` segment now contains the instructions of `A` as well as `C`.
|
|
||||||
|
|
||||||
Within the `code` segment, each line contains a comment, label or instruction.
|
|
||||||
Comments start with a `#` and are ignored. Labels should always be the first
|
|
||||||
word on a line, and they end with a `:`.
|
|
||||||
|
|
||||||
Instructions can refer to labels in displacement or immediate arguments, and
|
|
||||||
they'll obtain a value based on the address of the label: immediate arguments
|
|
||||||
will contain the address directly, while displacement arguments will contain
|
|
||||||
the difference between the address and the address of the current instruction.
|
|
||||||
The latter is mostly useful for `jump` and `call` instructions.
|
|
||||||
|
|
||||||
Functions are defined using labels. By convention, labels internal to functions
|
|
||||||
(that must only be jumped to) start with a `$`. Any other labels must only be
|
|
||||||
called, never jumped to. All labels must be unique.
|
|
||||||
|
|
||||||
Functions are called using the following syntax:
|
|
||||||
```
|
|
||||||
(func arg1 arg2 ...)
|
|
||||||
```
|
|
||||||
|
|
||||||
Function arguments must be either literals (integers or strings) or a reg/mem
|
|
||||||
operand using the syntax in the previous section.
|
|
||||||
|
|
||||||
A special label is `Entry`, which can be used to specify/override the entry
|
|
||||||
point of the program. It doesn't have to be unique, and the latest definition
|
|
||||||
will override earlier ones.
|
|
||||||
|
|
||||||
(The `Entry` label, along with duplicate segment headers, allows programs to
|
|
||||||
be built up incrementally out of multiple [_layers_](http://akkartik.name/post/wart-layers).)
|
|
||||||
|
|
||||||
Another special pair of labels are the block delimiters `{` and `}`. They can
|
|
||||||
be nested, and jump instructions can take arguments `loop` or `break` that
|
|
||||||
jump to the enclosing `{` and `}` respectively.
|
|
||||||
|
|
||||||
The data segment consists of labels as before and byte values. Referring to
|
|
||||||
data labels in either `code` segment instructions or `data` segment values
|
|
||||||
yields their address.
|
|
||||||
|
|
||||||
Automatic tests are an important part of SubX, and there's a simple mechanism
|
|
||||||
to provide a test harness: all functions that start with `test-` are called in
|
|
||||||
turn by a special, auto-generated function called `run-tests`. How you choose
|
|
||||||
to call it is up to you.
|
|
||||||
|
|
||||||
I try to keep things simple so that there's less work to do when implementing
|
|
||||||
SubX in SubX. But there _is_ one convenience: instructions can provide a
|
|
||||||
string literal surrounded by quotes (`"`) in an `imm32` argument. SubX will
|
|
||||||
transparently copy it to the `data` segment and replace it with its address.
|
|
||||||
Strings are the only place where a SubX word is allowed to contain spaces.
|
|
||||||
|
|
||||||
That should be enough information for writing SubX programs. The `apps/`
|
|
||||||
directory provides some fodder for practice in the `apps/ex*.subx` files,
|
|
||||||
giving a more gradual introduction to SubX features. In particular, you should
|
|
||||||
work through `apps/factorial4.subx`, which demonstrates all the above ideas in
|
|
||||||
concert.
|
|
||||||
|
|
||||||
This repo includes binaries for all examples. At any commit, an example's
|
|
||||||
binary should be identical bit for bit with the result of translating the
|
|
||||||
corresponding `.subx` file. The binary should also be natively runnable on a
|
|
||||||
Linux system running on Intel x86 processors, either 32- or 64-bit. If either
|
|
||||||
of these invariants is broken it's a bug on my part.
|
|
||||||
|
|
||||||
## Running
|
|
||||||
|
|
||||||
`bootstrap` currently has the following sub-commands:
|
|
||||||
|
|
||||||
- `bootstrap help`: some helpful documentation to have at your fingertips.
|
|
||||||
|
|
||||||
- `bootstrap test`: runs all automated tests.
|
|
||||||
|
|
||||||
- `bootstrap translate <input files> -o <output ELF binary>`: translates `.subx`
|
|
||||||
files into an executable ELF binary.
|
|
||||||
|
|
||||||
- `bootstrap run <ELF binary> <args>`: simulates running the ELF binaries emitted
|
|
||||||
by `bootstrap translate`. Useful for testing and debugging.
|
|
||||||
|
|
||||||
Remember, not all 32-bit Linux binaries are guaranteed to run. I'm not
|
|
||||||
building general infrastructure here for all of the x86 instruction set.
|
|
||||||
SubX is about programming with a small, regular subset of 32-bit x86.
|
|
||||||
|
|
||||||
## Getting your editor set up
|
|
||||||
|
|
||||||
If you've read this far, it's time to set up your editor. Mu is really
|
|
||||||
intended to be read interactively rather than on a browser.
|
|
||||||
|
|
||||||
There is rudimentary syntax highlighting support for Mu and SubX files for
|
|
||||||
various editors. Look for your editor in `mu.*` and `subx.*`, and follow the
|
|
||||||
instructions within.
|
|
||||||
|
|
||||||
The Vim files are most developed. In particular, I recommend some optional
|
|
||||||
setup in subx.vim to use multiple colors for comments.
|
|
||||||
|
|
||||||
If you use [Exuberant Ctags](http://ctags.sourceforge.net) for jumping easily
|
|
||||||
from names to their definitions in your editor, copy the contents of `exuberant_ctags_rc`
|
|
||||||
into your `.ctags` file.
|
|
||||||
|
|
||||||
## A few hints for debugging
|
|
||||||
|
|
||||||
Writing programs in SubX is surprisingly pleasant and addictive. Reading
|
|
||||||
programs is a work in progress, and hopefully the extensive unit tests help.
|
|
||||||
However, _debugging_ programs is where one really faces up to the low-level
|
|
||||||
nature of SubX. Even the smallest modifications need testing to make sure they
|
|
||||||
work. In my experience, there is no modification so small that I get it working
|
|
||||||
on the first attempt. And when it doesn't work, there are no clear error
|
|
||||||
messages. Machine code is too simple-minded for that. You can't use a debugger,
|
|
||||||
since SubX's simplistic ELF binaries contain no debugging information. So
|
|
||||||
debugging requires returning to basics and practicing with a new, more
|
|
||||||
rudimentary but hopefully still workable toolkit:
|
|
||||||
|
|
||||||
- Start by nailing down a concrete set of steps for reproducibly obtaining the
|
|
||||||
error or erroneous behavior.
|
|
||||||
|
|
||||||
- If possible, turn the steps into a failing test. It's not always possible,
|
|
||||||
but SubX's primary goal is to keep improving the variety of tests one can
|
|
||||||
write.
|
|
||||||
|
|
||||||
- Start running the single failing test alone. This involves modifying the top
|
|
||||||
of the program (or the final `.subx` file passed in to `bootstrap translate`) by
|
|
||||||
replacing the call to `run-tests` with a call to the appropriate `test-`
|
|
||||||
function.
|
|
||||||
|
|
||||||
- Generate a trace for the failing test while running your program in emulated
|
|
||||||
mode (`bootstrap run`):
|
|
||||||
```
|
|
||||||
$ ./bootstrap translate input.subx -o binary
|
|
||||||
$ ./bootstrap --trace run binary arg1 arg2 2>trace
|
|
||||||
```
|
|
||||||
The ability to generate a trace is the essential reason for the existence of
|
|
||||||
`bootstrap run` mode. It gives far better visibility into program internals than
|
|
||||||
running natively.
|
|
||||||
|
|
||||||
- As a further refinement, it is possible to render label names in the trace
|
|
||||||
by adding a second flag to the `bootstrap translate` command:
|
|
||||||
```
|
|
||||||
$ ./bootstrap --debug translate input.subx -o binary
|
|
||||||
$ ./bootstrap --trace run binary arg1 arg2 2>trace
|
|
||||||
```
|
|
||||||
`bootstrap --debug translate` emits a mapping from label to address in a file
|
|
||||||
called `labels`. `bootstrap --trace run` reads in the `labels` file if
|
|
||||||
it exists and prints out any matching label name as it traces each instruction
|
|
||||||
executed.
|
|
||||||
|
|
||||||
Here's a sample of what a trace looks like, with a few boxes highlighted:
|
|
||||||
|
|
||||||
<img alt='trace example' src='html/trace.png'>
|
|
||||||
|
|
||||||
Each of the green boxes shows the trace emitted for a single instruction.
|
|
||||||
It starts with a line of the form `run: inst: ___` followed by the opcode
|
|
||||||
for the instruction, the state of registers before the instruction executes,
|
|
||||||
and various other facts deduced during execution. Some instructions first
|
|
||||||
print a matching label. In the above screenshot, the red boxes show that
|
|
||||||
address `0x0900005e` maps to label `$loop` and presumably marks the start of
|
|
||||||
some loop. Function names get similar `run: == label` lines.
|
|
||||||
|
|
||||||
- One trick when emitting traces with labels:
|
|
||||||
```
|
|
||||||
$ grep label trace
|
|
||||||
```
|
|
||||||
This is useful for quickly showing you the control flow for the run, and the
|
|
||||||
function executing when the error occurred. I find it useful to start with
|
|
||||||
this information, only looking at the complete trace after I've gotten
|
|
||||||
oriented on the control flow. Did it get to the loop I just modified? How
|
|
||||||
many times did it go through the loop?
|
|
||||||
|
|
||||||
- Once you have SubX displaying labels in traces, it's a short step to modify
|
|
||||||
the program to insert more labels just to gain more insight. For example,
|
|
||||||
consider the following function:
|
|
||||||
|
|
||||||
<img alt='control example -- before' src='html/control0.png'>
|
|
||||||
|
|
||||||
This function contains a series of jump instructions. If a trace shows
|
|
||||||
`is-hex-lowercase-byte?` being encountered, and then `$is-hex-lowercase-byte?:end`
|
|
||||||
being encountered, it's still ambiguous what happened. Did we hit an early
|
|
||||||
exit, or did we execute all the way through? To clarify this, add temporary
|
|
||||||
labels after each jump:
|
|
||||||
|
|
||||||
<img alt='control example -- after' src='html/control1.png'>
|
|
||||||
|
|
||||||
Now the trace should have a lot more detail on which of these labels was
|
|
||||||
reached, and precisely when the exit was taken.
|
|
||||||
|
|
||||||
- If you find yourself wondering, "when did the contents of this memory
|
|
||||||
address change?", `bootstrap run` has some rudimentary support for _watch
|
|
||||||
points_. Just insert a label starting with `$watch-` before an instruction
|
|
||||||
that writes to the address, and its value will start getting dumped to the
|
|
||||||
trace after every instruction thereafter.
|
|
||||||
|
|
||||||
- Once we have a sense for precisely which instructions we want to look at,
|
|
||||||
it's time to look at the trace as a whole. Key is the state of registers
|
|
||||||
before each instruction. If a function is receiving bad arguments it becomes
|
|
||||||
natural to inspect what values were pushed on the stack before calling it,
|
|
||||||
tracing back further from there, and so on.
|
|
||||||
|
|
||||||
I occasionally want to see the precise state of the stack segment, in which
|
|
||||||
case I uncomment a commented-out call to `dump_stack()` in the `vm.cc`
|
|
||||||
layer. It makes the trace a lot more verbose and a lot less dense, necessitating
|
|
||||||
a lot more scrolling around, so I keep it turned off most of the time.
|
|
||||||
|
|
||||||
- If the trace seems overwhelming, try [browsing it](https://github.com/akkartik/mu/blob/master/tools/browse_trace.readme.md)
|
|
||||||
in the 'time-travel debugger'.
|
|
||||||
|
|
||||||
- Don't be afraid to slice and dice the trace using Unix tools. For example,
|
|
||||||
say you have a SubX binary that dies while running tests. You can see what
|
|
||||||
test it's segfaulting at by compiling it with debug information using
|
|
||||||
`./translate_subx_debug`, and then running:
|
|
||||||
|
|
||||||
```
|
|
||||||
./bootstrap --trace --dump run a.elf test 2>&1 |grep 'label test'
|
|
||||||
```
|
|
||||||
|
|
||||||
Just read out the last test printed out before the segfault.
|
|
||||||
|
|
||||||
Hopefully these hints are enough to get you started. The main thing to
|
|
||||||
remember is to not be afraid of modifying the sources. A good debugging
|
|
||||||
session gets into a nice rhythm of generating a trace, staring at it for a
|
|
||||||
while, modifying the sources, regenerating the trace, and so on. Email
|
|
||||||
[me](mailto:mu@akkartik.com) if you'd like another pair of eyes to stare at a
|
|
||||||
trace, or if you have questions or complaints.
|
|
||||||
|
|
||||||
## Reference documentation on available primitives
|
|
||||||
|
|
||||||
### Data Structures
|
|
||||||
|
|
||||||
- Kernel strings: null-terminated regions of memory. Unsafe and to be avoided,
|
|
||||||
but needed for interacting with the kernel.
|
|
||||||
|
|
||||||
- Arrays: length-prefixed regions of memory containing multiple elements of a
|
|
||||||
single type. Contents are preceded by 4 bytes (32 bits) containing the
|
|
||||||
`length` of the array in bytes.
|
|
||||||
|
|
||||||
- Slices: a pair of 32-bit addresses denoting a [half-open](https://en.wikipedia.org/wiki/Interval_(mathematics))
|
|
||||||
\[`start`, `end`) interval to live memory with a consistent lifetime.
|
|
||||||
|
|
||||||
Invariant: `start` <= `end`
|
|
||||||
|
|
||||||
- Streams: strings prefixed by 32-bit `write` and `read` indexes that the next
|
|
||||||
write or read goes to, respectively.
|
|
||||||
|
|
||||||
- offset 0: write index
|
|
||||||
- offset 4: read index
|
|
||||||
- offset 8: length of array (in bytes)
|
|
||||||
- offset 12: start of array data
|
|
||||||
|
|
||||||
Invariant: 0 <= `read` <= `write` <= `length`
|
|
||||||
|
|
||||||
- File descriptors (fd): Low-level 32-bit integers that the kernel uses to
|
|
||||||
track files opened by the program.
|
|
||||||
|
|
||||||
- File: 32-bit value containing either a fd or an address to a stream (fake
|
|
||||||
file).
|
|
||||||
|
|
||||||
- Buffered files (buffered-file): Contain a file descriptor and a stream for
|
|
||||||
buffering reads/writes. Each `buffered-file` must exclusively perform either
|
|
||||||
reads or writes.
|
|
||||||
|
|
||||||
### 'system calls'
|
|
||||||
|
|
||||||
As I said at the top, a primary design goal of SubX (and Mu more broadly) is
|
|
||||||
to explore ways to turn arbitrary manual tests into reproducible automated
|
|
||||||
tests. SubX aims for this goal by baking testable interfaces deep into the
|
|
||||||
stack, at the OS syscall level. The idea is that every syscall that interacts
|
|
||||||
with hardware (and so the environment) should be *dependency injected* so that
|
|
||||||
it's possible to insert fake hardware in tests.
|
|
||||||
|
|
||||||
But those are big goals. Here are the syscalls I have so far:
|
|
||||||
|
|
||||||
- `write`: takes two arguments, a file `f` and an address to array `s`.
|
|
||||||
|
|
||||||
Comparing this interface with the Unix `write()` syscall shows two benefits:
|
|
||||||
|
|
||||||
1. SubX can handle 'fake' file descriptors in tests.
|
|
||||||
|
|
||||||
1. `write()` accepts buffer and its length in separate arguments, which
|
|
||||||
requires callers to manage the two separately and so can be error-prone.
|
|
||||||
SubX's wrapper keeps the two together to increase the chances that we
|
|
||||||
never accidentally go out of array bounds.
|
|
||||||
|
|
||||||
- `read`: takes two arguments, a file `f` and an address to stream `s`. Reads
|
|
||||||
as much data from `f` as can fit in (the free space of) `s`.
|
|
||||||
|
|
||||||
Like with `write()`, this wrapper around the Unix `read()` syscall adds the
|
|
||||||
ability to handle 'fake' file descriptors in tests, and reduces the chances
|
|
||||||
of clobbering outside array bounds.
|
|
||||||
|
|
||||||
One bit of weirdness here: in tests we do a redundant copy from one stream
|
|
||||||
to another. See [the comments before the implementation](http://akkartik.github.io/mu/html/060read.subx.html)
|
|
||||||
for a discussion of alternative interfaces.
|
|
||||||
|
|
||||||
- `stop`: takes two arguments:
|
|
||||||
- `ed` is an address to an _exit descriptor_. Exit descriptors allow us to
|
|
||||||
`exit()` the program in production, but return to the test harness within
|
|
||||||
tests. That allows tests to make assertions about when `exit()` is called.
|
|
||||||
- `value` is the status code to `exit()` with.
|
|
||||||
|
|
||||||
For more details on exit descriptors and how to create one, see [the
|
|
||||||
comments before the implementation](http://akkartik.github.io/mu/html/059stop.subx.html).
|
|
||||||
|
|
||||||
- `new-segment`
|
|
||||||
|
|
||||||
Allocates a whole new segment of memory for the program, discontiguous with
|
|
||||||
both existing code and data (heap) segments. Just a more opinionated form of
|
|
||||||
[`mmap`](http://man7.org/linux/man-pages/man2/mmap.2.html).
|
|
||||||
|
|
||||||
- `allocate`: takes two arguments, an address to allocation-descriptor `ad`
|
|
||||||
and an integer `n`
|
|
||||||
|
|
||||||
Allocates a contiguous range of memory that is guaranteed to be exclusively
|
|
||||||
available to the caller. Returns the starting address to the range in `eax`.
|
|
||||||
|
|
||||||
An allocation descriptor tracks allocated vs available addresses in some
|
|
||||||
contiguous range of memory. The int specifies the number of bytes to allocate.
|
|
||||||
|
|
||||||
Explicitly passing in an allocation descriptor allows for nested memory
|
|
||||||
management, where a sub-system gets a chunk of memory and further parcels it
|
|
||||||
out to individual allocations. Particularly helpful for (surprise) tests.
|
|
||||||
|
|
||||||
- ... _(to be continued)_
|
|
||||||
|
|
||||||
I will continue to import syscalls over time from [the old Mu VM in the parent
|
|
||||||
directory](https://github.com/akkartik/mu), which has experimented with
|
|
||||||
interfaces for the screen, keyboard, mouse, disk and network.
|
|
||||||
|
|
||||||
### primitives built atop system calls
|
|
||||||
|
|
||||||
_(Compound arguments are usually passed in by reference. Where the results are
|
|
||||||
compound objects that don't fit in a register, the caller usually passes in
|
|
||||||
allocated memory for it.)_
|
|
||||||
|
|
||||||
#### assertions for tests
|
|
||||||
- `check-ints-equal`: fails current test if given ints aren't equal
|
|
||||||
- `check-stream-equal`: fails current test if stream doesn't match string
|
|
||||||
- `check-next-stream-line-equal`: fails current test if next line of stream
|
|
||||||
until newline doesn't match string
|
|
||||||
|
|
||||||
#### error handling
|
|
||||||
- `error`: takes three arguments, an exit-descriptor, a file and a string (message)
|
|
||||||
|
|
||||||
Prints out the message to the file and then exits using the provided
|
|
||||||
exit-descriptor.
|
|
||||||
|
|
||||||
- `error-byte`: like `error` but takes an extra byte value that it prints out
|
|
||||||
at the end of the message.
|
|
||||||
|
|
||||||
#### predicates
|
|
||||||
- `kernel-string-equal?`: compares a kernel string with a string
|
|
||||||
- `string-equal?`: compares two strings
|
|
||||||
- `stream-data-equal?`: compares a stream with a string
|
|
||||||
- `next-stream-line-equal?`: compares with string the next line in a stream, from
|
|
||||||
`read` index to newline
|
|
||||||
|
|
||||||
- `slice-empty?`: checks if the `start` and `end` of a slice are equal
|
|
||||||
- `slice-equal?`: compares a slice with a string
|
|
||||||
- `slice-starts-with?`: compares the start of a slice with a string
|
|
||||||
- `slice-ends-with?`: compares the end of a slice with a string
|
|
||||||
|
|
||||||
#### writing to disk
|
|
||||||
- `write`: string -> file
|
|
||||||
- Can also be used to cat a string into a stream.
|
|
||||||
- Will abort the entire program if destination is a stream and doesn't have
|
|
||||||
enough room.
|
|
||||||
- `write-stream`: stream -> file
|
|
||||||
- Can also be used to cat one stream into another.
|
|
||||||
- Will abort the entire program if destination is a stream and doesn't have
|
|
||||||
enough room.
|
|
||||||
- `write-slice`: slice -> stream
|
|
||||||
- Will abort the entire program if there isn't enough room in the
|
|
||||||
destination stream.
|
|
||||||
- `append-byte`: int -> stream
|
|
||||||
- Will abort the entire program if there isn't enough room in the
|
|
||||||
destination stream.
|
|
||||||
- `append-byte-hex`: int -> stream
|
|
||||||
- textual representation in hex, no '0x' prefix
|
|
||||||
- Will abort the entire program if there isn't enough room in the
|
|
||||||
destination stream.
|
|
||||||
- `print-int32`: int -> stream
|
|
||||||
- textual representation in hex, including '0x' prefix
|
|
||||||
- Will abort the entire program if there isn't enough room in the
|
|
||||||
destination stream.
|
|
||||||
- `write-buffered`: string -> buffered-file
|
|
||||||
- `write-slice-buffered`: slice -> buffered-file
|
|
||||||
- `flush`: buffered-file
|
|
||||||
- `write-byte-buffered`: int -> buffered-file
|
|
||||||
- `print-byte-buffered`: int -> buffered-file
|
|
||||||
- textual representation in hex, no '0x' prefix
|
|
||||||
- `print-int32-buffered`: int -> buffered-file
|
|
||||||
- textual representation in hex, including '0x' prefix
|
|
||||||
|
|
||||||
#### reading from disk
|
|
||||||
- `read`: file -> stream
|
|
||||||
- Can also be used to cat one stream into another.
|
|
||||||
- Will silently stop reading when destination runs out of space.
|
|
||||||
- `read-byte-buffered`: buffered-file -> byte
|
|
||||||
- `read-line-buffered`: buffered-file -> stream
|
|
||||||
- Will abort the entire program if there isn't enough room.
|
|
||||||
|
|
||||||
#### non-IO operations on streams
|
|
||||||
- `new-stream`: allocates space for a stream of `n` elements, each occupying
|
|
||||||
`b` bytes.
|
|
||||||
- Will abort the entire program if `n*b` requires more than 32 bits.
|
|
||||||
- `clear-stream`: resets everything in the stream to `0` (except its `length`).
|
|
||||||
- `rewind-stream`: resets the read index of the stream to `0` without modifying
|
|
||||||
its contents.
|
|
||||||
|
|
||||||
#### reading/writing hex representations of integers
|
|
||||||
- `is-hex-int?`: takes a slice argument, returns boolean result in `eax`
|
|
||||||
- `parse-hex-int`: takes a slice argument, returns int result in `eax`
|
|
||||||
- `is-hex-digit?`: takes a 32-bit word containing a single byte, returns
|
|
||||||
boolean result in `eax`.
|
|
||||||
- `from-hex-char`: takes a hexadecimal digit character in `eax`, returns its
|
|
||||||
numeric value in `eax`
|
|
||||||
- `to-hex-char`: takes a single-digit numeric value in `eax`, returns its
|
|
||||||
corresponding hexadecimal character in `eax`
|
|
||||||
|
|
||||||
#### tokenization
|
|
||||||
|
|
||||||
from a stream:
|
|
||||||
- `next-token`: stream, delimiter byte -> slice
|
|
||||||
- `skip-chars-matching`: stream, delimiter byte
|
|
||||||
- `skip-chars-not-matching`: stream, delimiter byte
|
|
||||||
|
|
||||||
from a slice:
|
|
||||||
- `next-token-from-slice`: start, end, delimiter byte -> slice
|
|
||||||
- Given a slice and a delimiter byte, returns a new slice inside the input
|
|
||||||
that ends at the delimiter byte.
|
|
||||||
|
|
||||||
- `skip-chars-matching-in-slice`: curr, end, delimiter byte -> new-curr (in `eax`)
|
|
||||||
- `skip-chars-not-matching-in-slice`: curr, end, delimiter byte -> new-curr (in `eax`)
|
|
||||||
|
|
||||||
## Resources
|
|
||||||
|
|
||||||
- [Single-page cheatsheet for the x86 ISA](https://net.cs.uni-bonn.de/fileadmin/user_upload/plohmann/x86_opcode_structure_and_instruction_overview.pdf)
|
|
||||||
(pdf; [cached local copy](https://github.com/akkartik/mu/blob/master/cheatsheet.pdf))
|
|
||||||
- [Concise reference for the x86 ISA](https://c9x.me/x86)
|
|
||||||
- [Intel processor manual](http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf) (pdf)
|
|
||||||
- [“Bootstrapping a compiler from nothing”](http://web.archive.org/web/20061108010907/http://www.rano.org/bcompiler.html) by Edmund Grumley-Evans.
|
|
||||||
- [“Creating tiny ELF executables”](https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html) by Brian Raiter.
|
|
||||||
- [StoneKnifeForth](https://github.com/kragen/stoneknifeforth) by [Kragen Sitaker](http://canonical.org/~kragen).
|
|
||||||
|
|
||||||
## Other forks
|
|
||||||
|
|
||||||
Forks of Mu are encouraged. If you don't like something about this repo, feel
|
Forks of Mu are encouraged. If you don't like something about this repo, feel
|
||||||
free to make a fork. If you show it to me, I'll link to it here, so others can
|
free to make a fork. If you show it to me, I'll link to it here, so others can
|
||||||
|
@ -726,64 +145,34 @@ use it. I might even pull your changes into this repo!
|
||||||
- [mu-normie](https://git.sr.ht/~akkartik/mu-normie): with a more standard
|
- [mu-normie](https://git.sr.ht/~akkartik/mu-normie): with a more standard
|
||||||
build system and C++ modules.
|
build system and C++ modules.
|
||||||
|
|
||||||
## Conclusion
|
## Desiderata
|
||||||
|
|
||||||
The hypothesis of Mu and SubX is that designing the entire system to be
|
|
||||||
testable from day 1 and from the ground up would radically impact the culture
|
|
||||||
of the eco-system in a way that no bolted-on tool or service at higher levels
|
|
||||||
can replicate:
|
|
||||||
|
|
||||||
- Tests would make it easier to write programs that can be easily understood
|
|
||||||
by newcomers.
|
|
||||||
|
|
||||||
- More broad-based understanding would lead to more forks.
|
|
||||||
|
|
||||||
- Tests would make it easy to share code across forks. Copy the tests over,
|
|
||||||
and then copy code over and polish it until the tests pass. Manual work, but
|
|
||||||
tractable and without major risks.
|
|
||||||
|
|
||||||
- The community would gain a diversified portfolio of forks for each program,
|
|
||||||
a “wavefront” of possible combinations of features and alternative
|
|
||||||
implementations of features. Application writers who wrote thorough tests
|
|
||||||
for their apps (something they just can’t do today) would be able to bounce
|
|
||||||
around between forks more easily without getting locked in to a single one
|
|
||||||
as currently happens.
|
|
||||||
|
|
||||||
- There would be a stronger culture of reviewing the code for programs you use
|
|
||||||
or libraries you depend on. [More eyeballs would make more bugs shallow.](https://en.wikipedia.org/wiki/Linus%27s_Law)
|
|
||||||
|
|
||||||
To falsify these hypotheses, here's a roadmap of the next few planned features:
|
|
||||||
|
|
||||||
- Testable, dependency-injected vocabulary of primitives
|
|
||||||
- Streams: `read()`, `write()`. (✓)
|
|
||||||
- `exit()` (✓)
|
|
||||||
- Client-like non-blocking socket/file primitives: `load`, `save`
|
|
||||||
- Concurrency, and a framework for testing blocking code
|
|
||||||
- Server-like blocking socket/file primitives
|
|
||||||
|
|
||||||
- Gradually streamline the bundled kernel, stripping away code we don't need.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
If you're still reading, here are some more things to check out:
|
If you're still reading, here are some more things to check out:
|
||||||
|
|
||||||
a) Try running the tests: `./test_apps`
|
- The references on [Mu](mu.md) and [SubX](subx.md) syntax, and also [bare
|
||||||
|
SubX](subx_bare.md) without any syntax sugar.
|
||||||
|
|
||||||
b) There's a handy [summary](mu_instructions) of how the Mu compiler translates
|
- [How to get your text editor set up for Mu and SubX programs.](editor.md)
|
||||||
instructions to SubX.
|
|
||||||
|
|
||||||
c) Check out the online help on SubX. Starting point: `./bootstrap`
|
- [Some tips for debugging Mu and SubX programs.](debugging.md)
|
||||||
|
|
||||||
d) Familiarize yourself with the list of opcodes supported in SubX: `./bootstrap
|
- [Shared vocabulary of data types and functions shared by Mu programs.](vocabulary.md)
|
||||||
help opcodes`. (It's also [in this repo](https://github.com/akkartik/mu/blob/master/subx_opcodes).)
|
Mu programs can transparently call low-level functions written in SubX.
|
||||||
[Here](https://lobste.rs/s/qglfdp/subx_minimalist_assembly_language_for#c_o9ddqk)
|
|
||||||
are some tips on my setup for quickly finding the right opcode for any
|
|
||||||
situation from within Vim.
|
|
||||||
|
|
||||||
e) Try working on [some starter SubX exercises](https://github.com/akkartik/mu/pulls)
|
- [A summary](mu_instructions) of how the Mu compiler translates instructions
|
||||||
(labelled `hello`).
|
to SubX. ([colorized version](http://akkartik.github.io/mu/html/mu_instructions.html))
|
||||||
|
|
||||||
f) SubX comes with some useful [syntax sugar](http://akkartik.name/post/mu-2019-1).
|
- [Some starter exercises for learning SubX](https://github.com/akkartik/mu/pulls)
|
||||||
|
(labelled `hello`). Feel free to [ping me](mailto:ak@akkartik.com) with any questions.
|
||||||
|
|
||||||
|
- [Commandline reference for the bootstrap C++ program.](bootstrap.md)
|
||||||
|
|
||||||
|
- The [list of x86 opcodes](subx_opcodes) supported in SubX: `./bootstrap
|
||||||
|
help opcodes`.
|
||||||
|
|
||||||
|
- [Some details on the unconventional organization of this project.](http://akkartik.name/post/four-repos)
|
||||||
|
|
||||||
|
- Previous prototypes: [mu0](https://github.com/akkartik/mu0), [mu1](https://github.com/akkartik/mu1).
|
||||||
|
|
||||||
## Credits
|
## Credits
|
||||||
|
|
||||||
|
@ -795,36 +184,15 @@ Mu builds on many ideas that have come before, especially:
|
||||||
and [Richard Gabriel](http://dreamsongs.net/Files/PatternsOfSoftware.pdf) for
|
and [Richard Gabriel](http://dreamsongs.net/Files/PatternsOfSoftware.pdf) for
|
||||||
the intellectual tools for reasoning about the higher order design of a
|
the intellectual tools for reasoning about the higher order design of a
|
||||||
codebase;
|
codebase;
|
||||||
- Unix and C for showing us how to co-evolve language and OS, and for teaching
|
|
||||||
the (much maligned, misunderstood and underestimated) value of concise
|
|
||||||
*implementation* in addition to a clean interface;
|
|
||||||
- Donald Knuth's [literate programming](http://www.literateprogramming.com/knuthweb.pdf)
|
|
||||||
for liberating "code for humans to read" from the tyranny of compiler order;
|
|
||||||
- [David Parnas](http://www.cs.umd.edu/class/spring2003/cmsc838p/Design/criteria.pdf)
|
- [David Parnas](http://www.cs.umd.edu/class/spring2003/cmsc838p/Design/criteria.pdf)
|
||||||
and others for highlighting the value of separating concerns and stepwise
|
and others for highlighting the value of separating concerns and stepwise
|
||||||
refinement;
|
refinement;
|
||||||
- [Lisp](http://www.paulgraham.com/rootsoflisp.html) for showing the power of
|
|
||||||
dynamic languages, late binding and providing the right primitives *a la
|
|
||||||
carte*, especially lisp macros;
|
|
||||||
- The folklore of debugging by print and the trace facility in many lisp
|
- The folklore of debugging by print and the trace facility in many lisp
|
||||||
systems;
|
systems;
|
||||||
- Automated tests for showing the value of developing programs inside an
|
- Automated tests for showing the value of developing programs inside an
|
||||||
elaborate harness;
|
elaborate harness;
|
||||||
- [Python doctest](http://docs.python.org/2/library/doctest.html) for
|
|
||||||
exemplifying interactive documentation that doubles as tests;
|
|
||||||
- [ReStructuredText](https://en.wikipedia.org/wiki/ReStructuredText)
|
|
||||||
and [its antecedents](https://en.wikipedia.org/wiki/Setext) for showing that
|
|
||||||
markup can be clean;
|
|
||||||
- BDD for challenging us all to write tests at a higher level;
|
|
||||||
- JavaScript and CSS for demonstrating the power of a DOM for complex
|
|
||||||
structured documents;
|
|
||||||
- Rust for demonstrating that a system-programming language can be safe;
|
|
||||||
- Forth for demonstrating that ergonomics don't require grammar; and
|
|
||||||
- [Minimal Linux Live](http://minimal.linux-bg.org) for teaching how to create
|
- [Minimal Linux Live](http://minimal.linux-bg.org) for teaching how to create
|
||||||
a bootable disk image.
|
a bootable disk image.
|
||||||
- [Soso](https://github.com/ozkl/soso), a tiny hackable OS.
|
- [“Bootstrapping a compiler from nothing”](http://web.archive.org/web/20061108010907/http://www.rano.org/bcompiler.html) by Edmund Grumley-Evans.
|
||||||
|
- [“Creating tiny ELF executables”](https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html) by Brian Raiter.
|
||||||
## Coda
|
- [StoneKnifeForth](https://github.com/kragen/stoneknifeforth) by [Kragen Sitaker](http://canonical.org/~kragen).
|
||||||
|
|
||||||
- [Some details on the unconventional organization of this project.](http://akkartik.name/post/four-repos)
|
|
||||||
- Previous prototypes: [mu0](https://github.com/akkartik/mu0), [mu1](https://github.com/akkartik/mu1).
|
|
||||||
|
|
|
@ -9,3 +9,9 @@ Some apps written in SubX and Mu, in 3 categories:
|
||||||
* More ambitious translator for a memory-safe language (in progress): `mu`
|
* More ambitious translator for a memory-safe language (in progress): `mu`
|
||||||
|
|
||||||
* Miscellaneous test programs.
|
* Miscellaneous test programs.
|
||||||
|
|
||||||
|
All SubX apps include binaries. At any commit, an example's binary should be
|
||||||
|
identical bit for bit with the result of translating the corresponding `.subx`
|
||||||
|
file. The binary should also be natively runnable on a Linux system running on
|
||||||
|
Intel x86 processors, either 32- or 64-bit. If either of these invariants is
|
||||||
|
broken, it's a bug.
|
||||||
|
|
|
@ -1,19 +0,0 @@
|
||||||
# Increment a number, and return the result in the exit code.
|
|
||||||
#
|
|
||||||
# To run:
|
|
||||||
# $ ./translate_mu apps/ex2.2.mu
|
|
||||||
# $ ./a.elf
|
|
||||||
# Expected result:
|
|
||||||
# $ echo $?
|
|
||||||
# 7
|
|
||||||
|
|
||||||
fn main -> result/ebx: int {
|
|
||||||
result <- foo
|
|
||||||
}
|
|
||||||
|
|
||||||
fn foo -> result/ebx: int {
|
|
||||||
var n: int
|
|
||||||
copy-to n, 3
|
|
||||||
increment n
|
|
||||||
result <- copy n
|
|
||||||
}
|
|
|
@ -1,4 +1,4 @@
|
||||||
# Add two numbers, and return the result in the exit code.
|
# Add 3 and 4, and return the result in the exit code.
|
||||||
#
|
#
|
||||||
# To run:
|
# To run:
|
||||||
# $ ./translate_mu apps/ex2.mu
|
# $ ./translate_mu apps/ex2.mu
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
# Add 1 and 1, and return the result in the exit code.
|
# Add 3 and 4, and return the result in the exit code.
|
||||||
#
|
#
|
||||||
# To run:
|
# To run:
|
||||||
# $ ./bootstrap translate init.linux apps/ex2.subx -o apps/ex2
|
# $ ./bootstrap translate init.linux apps/ex2.subx -o apps/ex2
|
||||||
|
@ -10,10 +10,10 @@
|
||||||
== code
|
== code
|
||||||
|
|
||||||
Entry:
|
Entry:
|
||||||
# ebx = 1
|
# ebx = 3
|
||||||
bb/copy-to-ebx 1/imm32
|
bb/copy-to-ebx 3/imm32
|
||||||
# increment ebx
|
# add 4 to ebx
|
||||||
43/increment-ebx
|
81 0/subop/add 3/mod/direct 3/rm32/ebx 4/imm32
|
||||||
# exit(ebx)
|
# exit(ebx)
|
||||||
e8/call syscall_exit/disp32
|
e8/call syscall_exit/disp32
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,17 @@
|
||||||
|
## Running
|
||||||
|
|
||||||
|
`bootstrap` currently has the following sub-commands:
|
||||||
|
|
||||||
|
- `bootstrap help`: some helpful documentation to have at your fingertips.
|
||||||
|
|
||||||
|
- `bootstrap test`: runs all automated tests.
|
||||||
|
|
||||||
|
- `bootstrap translate <input files> -o <output ELF binary>`: translates `.subx`
|
||||||
|
files into an executable ELF binary.
|
||||||
|
|
||||||
|
- `bootstrap run <ELF binary> <args>`: simulates running the ELF binaries emitted
|
||||||
|
by `bootstrap translate`. Useful for testing and debugging.
|
||||||
|
|
||||||
|
Remember, not all 32-bit Linux binaries are guaranteed to run. I'm not
|
||||||
|
building general infrastructure here for all of the x86 instruction set.
|
||||||
|
SubX is about programming with a small, regular subset of 32-bit x86.
|
|
@ -0,0 +1,122 @@
|
||||||
|
## A few hints for debugging
|
||||||
|
|
||||||
|
Writing programs in SubX is surprisingly pleasant and addictive. Reading
|
||||||
|
programs is a work in progress, and hopefully the extensive unit tests help.
|
||||||
|
However, _debugging_ programs is where one really faces up to the low-level
|
||||||
|
nature of SubX. Even the smallest modifications need testing to make sure they
|
||||||
|
work. In my experience, there is no modification so small that I get it working
|
||||||
|
on the first attempt. And when it doesn't work, there are no clear error
|
||||||
|
messages. Machine code is too simple-minded for that. You can't use a debugger,
|
||||||
|
since SubX's simplistic ELF binaries contain no debugging information. So
|
||||||
|
debugging requires returning to basics and practicing with a new, more
|
||||||
|
rudimentary but hopefully still workable toolkit:
|
||||||
|
|
||||||
|
- Start by nailing down a concrete set of steps for reproducibly obtaining the
|
||||||
|
error or erroneous behavior.
|
||||||
|
|
||||||
|
- If possible, turn the steps into a failing test. It's not always possible,
|
||||||
|
but SubX's primary goal is to keep improving the variety of tests one can
|
||||||
|
write.
|
||||||
|
|
||||||
|
- Start running the single failing test alone. This involves modifying the top
|
||||||
|
of the program (or the final `.subx` file passed in to `bootstrap translate`) by
|
||||||
|
replacing the call to `run-tests` with a call to the appropriate `test-`
|
||||||
|
function.
|
||||||
|
|
||||||
|
- Generate a trace for the failing test while running your program in emulated
|
||||||
|
mode (`bootstrap run`):
|
||||||
|
```
|
||||||
|
$ ./bootstrap translate input.subx -o binary
|
||||||
|
$ ./bootstrap --trace run binary arg1 arg2 2>trace
|
||||||
|
```
|
||||||
|
The ability to generate a trace is the essential reason for the existence of
|
||||||
|
`bootstrap run` mode. It gives far better visibility into program internals than
|
||||||
|
running natively.
|
||||||
|
|
||||||
|
- As a further refinement, it is possible to render label names in the trace
|
||||||
|
by adding a second flag to the `bootstrap translate` command:
|
||||||
|
```
|
||||||
|
$ ./bootstrap --debug translate input.subx -o binary
|
||||||
|
$ ./bootstrap --trace run binary arg1 arg2 2>trace
|
||||||
|
```
|
||||||
|
`bootstrap --debug translate` emits a mapping from label to address in a file
|
||||||
|
called `labels`. `bootstrap --trace run` reads in the `labels` file if
|
||||||
|
it exists and prints out any matching label name as it traces each instruction
|
||||||
|
executed.
|
||||||
|
|
||||||
|
Here's a sample of what a trace looks like, with a few boxes highlighted:
|
||||||
|
|
||||||
|
<img alt='trace example' src='html/trace.png'>
|
||||||
|
|
||||||
|
Each of the green boxes shows the trace emitted for a single instruction.
|
||||||
|
It starts with a line of the form `run: inst: ___` followed by the opcode
|
||||||
|
for the instruction, the state of registers before the instruction executes,
|
||||||
|
and various other facts deduced during execution. Some instructions first
|
||||||
|
print a matching label. In the above screenshot, the red boxes show that
|
||||||
|
address `0x0900005e` maps to label `$loop` and presumably marks the start of
|
||||||
|
some loop. Function names get similar `run: == label` lines.
|
||||||
|
|
||||||
|
- One trick when emitting traces with labels:
|
||||||
|
```
|
||||||
|
$ grep label trace
|
||||||
|
```
|
||||||
|
This is useful for quickly showing you the control flow for the run, and the
|
||||||
|
function executing when the error occurred. I find it useful to start with
|
||||||
|
this information, only looking at the complete trace after I've gotten
|
||||||
|
oriented on the control flow. Did it get to the loop I just modified? How
|
||||||
|
many times did it go through the loop?
|
||||||
|
|
||||||
|
- Once you have SubX displaying labels in traces, it's a short step to modify
|
||||||
|
the program to insert more labels just to gain more insight. For example,
|
||||||
|
consider the following function:
|
||||||
|
|
||||||
|
<img alt='control example -- before' src='html/control0.png'>
|
||||||
|
|
||||||
|
This function contains a series of jump instructions. If a trace shows
|
||||||
|
`is-hex-lowercase-byte?` being encountered, and then `$is-hex-lowercase-byte?:end`
|
||||||
|
being encountered, it's still ambiguous what happened. Did we hit an early
|
||||||
|
exit, or did we execute all the way through? To clarify this, add temporary
|
||||||
|
labels after each jump:
|
||||||
|
|
||||||
|
<img alt='control example -- after' src='html/control1.png'>
|
||||||
|
|
||||||
|
Now the trace should have a lot more detail on which of these labels was
|
||||||
|
reached, and precisely when the exit was taken.
|
||||||
|
|
||||||
|
- If you find yourself wondering, "when did the contents of this memory
|
||||||
|
address change?", `bootstrap run` has some rudimentary support for _watch
|
||||||
|
points_. Just insert a label starting with `$watch-` before an instruction
|
||||||
|
that writes to the address, and its value will start getting dumped to the
|
||||||
|
trace after every instruction thereafter.
|
||||||
|
|
||||||
|
- Once we have a sense for precisely which instructions we want to look at,
|
||||||
|
it's time to look at the trace as a whole. Key is the state of registers
|
||||||
|
before each instruction. If a function is receiving bad arguments it becomes
|
||||||
|
natural to inspect what values were pushed on the stack before calling it,
|
||||||
|
tracing back further from there, and so on.
|
||||||
|
|
||||||
|
I occasionally want to see the precise state of the stack segment, in which
|
||||||
|
case I uncomment a commented-out call to `dump_stack()` in the `vm.cc`
|
||||||
|
layer. It makes the trace a lot more verbose and a lot less dense, necessitating
|
||||||
|
a lot more scrolling around, so I keep it turned off most of the time.
|
||||||
|
|
||||||
|
- If the trace seems overwhelming, try [browsing it](https://github.com/akkartik/mu/blob/master/tools/browse_trace.readme.md)
|
||||||
|
in the 'time-travel debugger'.
|
||||||
|
|
||||||
|
- Don't be afraid to slice and dice the trace using Unix tools. For example,
|
||||||
|
say you have a SubX binary that dies while running tests. You can see what
|
||||||
|
test it's segfaulting at by compiling it with debug information using
|
||||||
|
`./translate_subx_debug`, and then running:
|
||||||
|
|
||||||
|
```
|
||||||
|
./bootstrap --trace --dump run a.elf test 2>&1 |grep 'label test'
|
||||||
|
```
|
||||||
|
|
||||||
|
Just read out the last test printed out before the segfault.
|
||||||
|
|
||||||
|
Hopefully these hints are enough to get you started. The main thing to
|
||||||
|
remember is to not be afraid of modifying the sources. A good debugging
|
||||||
|
session gets into a nice rhythm of generating a trace, staring at it for a
|
||||||
|
while, modifying the sources, regenerating the trace, and so on. Email
|
||||||
|
[me](mailto:mu@akkartik.com) if you'd like another pair of eyes to stare at a
|
||||||
|
trace, or if you have questions or complaints.
|
|
@ -0,0 +1,19 @@
|
||||||
|
## Getting your editor set up
|
||||||
|
|
||||||
|
If you've read this far, it's time to set up your editor. Mu is really
|
||||||
|
intended to be read interactively rather than on a browser.
|
||||||
|
|
||||||
|
There is rudimentary syntax highlighting support for Mu and SubX files for
|
||||||
|
various editors. Look for your editor in `mu.*` and `subx.*`, and follow the
|
||||||
|
instructions within.
|
||||||
|
|
||||||
|
The Vim files are most developed. In particular, I recommend some optional
|
||||||
|
setup in subx.vim to use multiple colors for comments.
|
||||||
|
|
||||||
|
If you use [Exuberant Ctags](http://ctags.sourceforge.net) for jumping easily
|
||||||
|
from names to their definitions in your editor, copy the contents of `exuberant_ctags_rc`
|
||||||
|
into your `.ctags` file.
|
||||||
|
|
||||||
|
[Here](https://lobste.rs/s/qglfdp/subx_minimalist_assembly_language_for#c_o9ddqk)
|
||||||
|
are some tips on my setup for quickly finding the right opcode for any
|
||||||
|
situation from within Vim.
|
|
@ -1,81 +0,0 @@
|
||||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
|
|
||||||
<html>
|
|
||||||
<head>
|
|
||||||
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
|
|
||||||
<title>Mu - apps/ex2.2.mu</title>
|
|
||||||
<meta name="Generator" content="Vim/8.1">
|
|
||||||
<meta name="plugin-version" content="vim8.1_v1">
|
|
||||||
<meta name="syntax" content="none">
|
|
||||||
<meta name="settings" content="number_lines,use_css,pre_wrap,no_foldcolumn,expand_tabs,line_ids,prevent_copy=">
|
|
||||||
<meta name="colorscheme" content="minimal-light">
|
|
||||||
<style type="text/css">
|
|
||||||
<!--
|
|
||||||
pre { white-space: pre-wrap; font-family: monospace; color: #000000; background-color: #c6c6c6; }
|
|
||||||
body { font-size:12pt; font-family: monospace; color: #000000; background-color: #c6c6c6; }
|
|
||||||
a { color:inherit; }
|
|
||||||
* { font-size:12pt; font-size: 1em; }
|
|
||||||
.LineNr { }
|
|
||||||
.muFunction { color: #af5f00; text-decoration: underline; }
|
|
||||||
.SpecialChar { color: #d70000; }
|
|
||||||
.Comment { color: #005faf; }
|
|
||||||
.Constant { color: #008787; }
|
|
||||||
.Delimiter { color: #c000c0; }
|
|
||||||
.PreProc { color: #c000c0; }
|
|
||||||
-->
|
|
||||||
</style>
|
|
||||||
|
|
||||||
<script type='text/javascript'>
|
|
||||||
<!--
|
|
||||||
|
|
||||||
/* function to open any folds containing a jumped-to line before jumping to it */
|
|
||||||
function JumpToLine()
|
|
||||||
{
|
|
||||||
var lineNum;
|
|
||||||
lineNum = window.location.hash;
|
|
||||||
lineNum = lineNum.substr(1); /* strip off '#' */
|
|
||||||
|
|
||||||
if (lineNum.indexOf('L') == -1) {
|
|
||||||
lineNum = 'L'+lineNum;
|
|
||||||
}
|
|
||||||
var lineElem = document.getElementById(lineNum);
|
|
||||||
/* Always jump to new location even if the line was hidden inside a fold, or
|
|
||||||
* we corrected the raw number to a line ID.
|
|
||||||
*/
|
|
||||||
if (lineElem) {
|
|
||||||
lineElem.scrollIntoView(true);
|
|
||||||
}
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
if ('onhashchange' in window) {
|
|
||||||
window.onhashchange = JumpToLine;
|
|
||||||
}
|
|
||||||
|
|
||||||
-->
|
|
||||||
</script>
|
|
||||||
</head>
|
|
||||||
<body onload='JumpToLine();'>
|
|
||||||
<a href='https://github.com/akkartik/mu/blob/master/apps/ex2.2.mu'>https://github.com/akkartik/mu/blob/master/apps/ex2.2.mu</a>
|
|
||||||
<pre id='vimCodeElement'>
|
|
||||||
<span id="L1" class="LineNr"> 1 </span><span class="Comment"># Increment a number, and return the result in the exit code.</span>
|
|
||||||
<span id="L2" class="LineNr"> 2 </span><span class="Comment">#</span>
|
|
||||||
<span id="L3" class="LineNr"> 3 </span><span class="Comment"># To run:</span>
|
|
||||||
<span id="L4" class="LineNr"> 4 </span><span class="Comment"># $ ./translate_mu apps/ex2.2.mu</span>
|
|
||||||
<span id="L5" class="LineNr"> 5 </span><span class="Comment"># $ ./a.elf</span>
|
|
||||||
<span id="L6" class="LineNr"> 6 </span><span class="Comment"># Expected result:</span>
|
|
||||||
<span id="L7" class="LineNr"> 7 </span><span class="Comment"># $ echo $?</span>
|
|
||||||
<span id="L8" class="LineNr"> 8 </span><span class="Comment"># 7</span>
|
|
||||||
<span id="L9" class="LineNr"> 9 </span>
|
|
||||||
<span id="L10" class="LineNr">10 </span><span class="PreProc">fn</span> <span class="muFunction">main</span><span class="PreProc"> -> </span>result/<span class="Constant">ebx</span>: int <span class="Delimiter">{</span>
|
|
||||||
<span id="L11" class="LineNr">11 </span> result <span class="SpecialChar"><-</span> <a href='ex2.2.mu.html#L14'>foo</a>
|
|
||||||
<span id="L12" class="LineNr">12 </span><span class="Delimiter">}</span>
|
|
||||||
<span id="L13" class="LineNr">13 </span>
|
|
||||||
<span id="L14" class="LineNr">14 </span><span class="PreProc">fn</span> <span class="muFunction"><a href='ex2.2.mu.html#L14'>foo</a></span><span class="PreProc"> -> </span>result/<span class="Constant">ebx</span>: int <span class="Delimiter">{</span>
|
|
||||||
<span id="L15" class="LineNr">15 </span> <span class="PreProc">var</span> n: int
|
|
||||||
<span id="L16" class="LineNr">16 </span> copy-to n, <span class="Constant">3</span>
|
|
||||||
<span id="L17" class="LineNr">17 </span> increment n
|
|
||||||
<span id="L18" class="LineNr">18 </span> result <span class="SpecialChar"><-</span> copy n
|
|
||||||
<span id="L19" class="LineNr">19 </span><span class="Delimiter">}</span>
|
|
||||||
</pre>
|
|
||||||
</body>
|
|
||||||
</html>
|
|
||||||
<!-- vim: set foldmethod=manual : -->
|
|
|
@ -0,0 +1,456 @@
|
||||||
|
# Mu Syntax
|
||||||
|
|
||||||
|
Here are two valid statements in Mu:
|
||||||
|
|
||||||
|
```
|
||||||
|
increment x
|
||||||
|
y <- increment
|
||||||
|
```
|
||||||
|
|
||||||
|
Understanding when to use one vs the other is the critical idea in Mu. In
|
||||||
|
short, the former increments a value in memory, while the latter increments a
|
||||||
|
value in a register.
|
||||||
|
|
||||||
|
Most languages start from some syntax and do what it takes to implement it.
|
||||||
|
Mu, however, is designed as a safe way to program in [a regular subset of
|
||||||
|
32-bit x86 machine code](subx.md), _satisficing_ rather than optimizing for a
|
||||||
|
clean syntax. To keep the mapping to machine code lightweight, Mu exclusively
|
||||||
|
uses statements and most statements map to a single instruction of machine
|
||||||
|
code.
|
||||||
|
|
||||||
|
Since the x86 instruction set restricts how many memory locations an instruction
|
||||||
|
can use, Mu makes registers explicit as well. Variables must be explicitly
|
||||||
|
mapped to registers; otherwise they live in memory.
|
||||||
|
|
||||||
|
Statements consist of 3 parts: the operation, optional _inouts_ and optional
|
||||||
|
_outputs_. Outputs come before the operation name and `<-`.
|
||||||
|
|
||||||
|
Outputs are always registers; memory locations that need to be modified are
|
||||||
|
passed in by reference using the `addr` type described below.
|
||||||
|
|
||||||
|
So Mu programmers need to make two new categories of decisions: whether to
|
||||||
|
define variables in registers or memory, and whether to put variables to the
|
||||||
|
left or right. There's always exactly one way to write any given operation. In
|
||||||
|
return for this overhead you get a lightweight and future-proof stack. And Mu
|
||||||
|
will provide good error message to support you.
|
||||||
|
|
||||||
|
Further down, this page enumerates all available primitives in Mu, and [a
|
||||||
|
separate page](http://akkartik.github.io/mu/html/mu_instructions.html)
|
||||||
|
describes how each primitive is translated to machine code.
|
||||||
|
|
||||||
|
## Functions and calls
|
||||||
|
|
||||||
|
Zooming out from single statements, here's a complete sample program in Mu:
|
||||||
|
|
||||||
|
<img alt='ex2.mu' src='html/ex2.mu.png'>
|
||||||
|
|
||||||
|
Mu programs are lists of functions. Each function has the following form:
|
||||||
|
|
||||||
|
```
|
||||||
|
fn _name_ inout ... -> output ... {
|
||||||
|
_statement_
|
||||||
|
_statement_
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Each function has a header line, and some number of statements, each on a
|
||||||
|
separate line. Headers describe inouts and outputs. Inouts can't be registers,
|
||||||
|
and outputs _must_ be registers.
|
||||||
|
|
||||||
|
The above program also demonstrates a function call (to the function `do-add`).
|
||||||
|
Function calls look the same as primitive statements: they can return (multiple)
|
||||||
|
outputs in registers, and modify inouts passed in by reference. In addition,
|
||||||
|
there's one more constraint: output registers must match the function header.
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```
|
||||||
|
fn f -> x/eax: int {
|
||||||
|
...
|
||||||
|
}
|
||||||
|
fn g {
|
||||||
|
a/eax <- f # ok
|
||||||
|
a/ebx <- f # wrong
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The function `main` is special; it is where the program starts running. It
|
||||||
|
must always return a single int in register `ebx` (as the exit status of the
|
||||||
|
process). It can also optionally accept an array of strings as input (from the
|
||||||
|
shell command-line). To be precise, `main` must have one of the following
|
||||||
|
two signatures:
|
||||||
|
|
||||||
|
- `fn main -> x/ebx: int`
|
||||||
|
- `fn main args: (addr array (addr array byte)) -> x/ebx: int`
|
||||||
|
|
||||||
|
(The name of the output is flexible.)
|
||||||
|
|
||||||
|
## Blocks
|
||||||
|
|
||||||
|
Blocks are useful for grouping related statements. They're delimited by `{`
|
||||||
|
and `}`, both each alone on a line.
|
||||||
|
|
||||||
|
Blocks can nest:
|
||||||
|
|
||||||
|
```
|
||||||
|
{
|
||||||
|
_statements_
|
||||||
|
{
|
||||||
|
_more statements_
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Blocks can be named (with the name ending in a `:` on the same line as the
|
||||||
|
`{`):
|
||||||
|
|
||||||
|
```
|
||||||
|
$name: {
|
||||||
|
_statements_
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Further down we'll see primitive statements for skipping or repeating blocks.
|
||||||
|
Besides control flow, the other use for blocks is...
|
||||||
|
|
||||||
|
## Local variables
|
||||||
|
|
||||||
|
Functions can define new variables at any time with the keyword `var`. There
|
||||||
|
are two variants of the `var` statement, for defining variables in registers
|
||||||
|
or memory.
|
||||||
|
|
||||||
|
```
|
||||||
|
var name: type
|
||||||
|
var name/reg: type <- ...
|
||||||
|
```
|
||||||
|
|
||||||
|
Variables on the stack are never initialized. (They're always implicitly
|
||||||
|
zeroed them out.) Variables in registers are always initialized.
|
||||||
|
|
||||||
|
Variables exist from their definition until the end of their containing block.
|
||||||
|
Register variables may also die earlier if their register is clobbered by a
|
||||||
|
new variable.
|
||||||
|
|
||||||
|
## Arithmetic primitives
|
||||||
|
|
||||||
|
Here is the list of arithmetic primitive operations supported by Mu. The name
|
||||||
|
`n` indicates a literal integer rather than a variable, and `var/reg` indicates
|
||||||
|
a variable in a register.
|
||||||
|
|
||||||
|
```
|
||||||
|
var/reg <- increment
|
||||||
|
increment var
|
||||||
|
var/reg <- decrement
|
||||||
|
decrement var
|
||||||
|
var1/reg1 <- add var2/reg2
|
||||||
|
var/reg <- add var2
|
||||||
|
add-to var1, var2/reg
|
||||||
|
var/reg <- add n
|
||||||
|
add-to var, n
|
||||||
|
|
||||||
|
var1/reg1 <- sub var2/reg2
|
||||||
|
var/reg <- sub var2
|
||||||
|
sub-from var1, var2/reg
|
||||||
|
var/reg <- sub n
|
||||||
|
sub-from var, n
|
||||||
|
|
||||||
|
var1/reg1 <- and var2/reg2
|
||||||
|
var/reg <- and var2
|
||||||
|
and-with var1, var2/reg
|
||||||
|
var/reg <- and n
|
||||||
|
and-with var, n
|
||||||
|
|
||||||
|
var1/reg1 <- or var2/reg2
|
||||||
|
var/reg <- or var2
|
||||||
|
or-with var1, var2/reg
|
||||||
|
var/reg <- or n
|
||||||
|
or-with var, n
|
||||||
|
|
||||||
|
var1/reg1 <- xor var2/reg2
|
||||||
|
var/reg <- xor var2
|
||||||
|
xor-with var1, var2/reg
|
||||||
|
var/reg <- xor n
|
||||||
|
xor-with var, n
|
||||||
|
|
||||||
|
var/reg <- copy var2/reg2
|
||||||
|
copy-to var1, var2/reg
|
||||||
|
var/reg <- copy var2
|
||||||
|
var/reg <- copy n
|
||||||
|
copy-to var, n
|
||||||
|
|
||||||
|
compare var1, var2/reg
|
||||||
|
compare var1/reg, var2
|
||||||
|
compare var/eax, n
|
||||||
|
compare var, n
|
||||||
|
|
||||||
|
var/reg <- multiply var2
|
||||||
|
```
|
||||||
|
|
||||||
|
Any statement above that takes a variable in memory can be replaced with a
|
||||||
|
dereference (`*`) of an address variable (of type `(addr ...)`) in a register.
|
||||||
|
(Types can have multiple words, and are wrapped in `()` when they do.) But you
|
||||||
|
can't dereference variables in memory. You have to load them into a register
|
||||||
|
first.
|
||||||
|
|
||||||
|
Excluding dereferences, the above statements must operate on non-address
|
||||||
|
primitive types: `int` or `boolean`. (Booleans are really just `int`s, and Mu
|
||||||
|
assumes any value but `0` is true.)
|
||||||
|
|
||||||
|
## Operating on individual bytes
|
||||||
|
|
||||||
|
A special-case is variables of type 'byte'. Mu is a 32-bit platform so for the
|
||||||
|
most part only supports types that are multiples of 32 bits. However, we do
|
||||||
|
want to support strings in ASCII and UTF-8, which will be arrays of bytes.
|
||||||
|
|
||||||
|
Since most x86 instructions implicitly load 32 bits at a time from memory,
|
||||||
|
variables of type 'byte' are only allowed in registers, not on the stack. Here
|
||||||
|
are the possible statements for reading bytes to/from memory:
|
||||||
|
|
||||||
|
```
|
||||||
|
var/reg <- copy-byte var2/reg2 # var: byte, var2: byte
|
||||||
|
var/reg <- copy-byte *var2/reg2 # var: byte, var2: (addr byte)
|
||||||
|
copy-byte-to *var1/reg1, var2/reg2 # var1: (addr byte), var2: byte
|
||||||
|
```
|
||||||
|
|
||||||
|
In addition, variables of type 'byte' are restricted to (the lowest bytes of)
|
||||||
|
just 4 registers: eax, ecx, edx and ebx.
|
||||||
|
|
||||||
|
## Primitive jumps
|
||||||
|
|
||||||
|
There are two kinds of jumps, both with many variations: `break` and `loop`.
|
||||||
|
`break` instructions jump to the end of the containing block. `loop` instructions
|
||||||
|
jump to the beginning of the containing block.
|
||||||
|
|
||||||
|
All jumps can take an optional label starting with '$':
|
||||||
|
|
||||||
|
```
|
||||||
|
loop $foo
|
||||||
|
```
|
||||||
|
|
||||||
|
This instruction jumps to the beginning of the block called $foo. The corresponding
|
||||||
|
`break` jumps to the end of the block. Either jump statement must lie somewhere
|
||||||
|
inside such a block. Jumps are only legal to containing blocks. (Use named
|
||||||
|
blocks with restraint; jumps to places far away can get confusing.)
|
||||||
|
|
||||||
|
There are two unconditional jumps:
|
||||||
|
|
||||||
|
```
|
||||||
|
loop
|
||||||
|
loop label
|
||||||
|
break
|
||||||
|
break label
|
||||||
|
```
|
||||||
|
|
||||||
|
The remaining jump instructions are all conditional. Conditional jumps rely on
|
||||||
|
the result of the most recently executed `compare` instruction. (To keep
|
||||||
|
programs easy to read, keep compare instructions close to the jump that uses
|
||||||
|
them.)
|
||||||
|
|
||||||
|
```
|
||||||
|
break-if-=
|
||||||
|
break-if-= label
|
||||||
|
break-if-!=
|
||||||
|
break-if-!= label
|
||||||
|
```
|
||||||
|
|
||||||
|
Inequalities are similar, but have unsigned and signed variants. For simplicity,
|
||||||
|
always use signed integers; use the unsigned variants only to compare addresses.
|
||||||
|
|
||||||
|
```
|
||||||
|
break-if-<
|
||||||
|
break-if-< label
|
||||||
|
break-if->
|
||||||
|
break-if-> label
|
||||||
|
break-if-<=
|
||||||
|
break-if-<= label
|
||||||
|
break-if->=
|
||||||
|
break-if->= label
|
||||||
|
|
||||||
|
break-if-addr<
|
||||||
|
break-if-addr< label
|
||||||
|
break-if-addr>
|
||||||
|
break-if-addr> label
|
||||||
|
break-if-addr<=
|
||||||
|
break-if-addr<= label
|
||||||
|
break-if-addr>=
|
||||||
|
break-if-addr>= label
|
||||||
|
```
|
||||||
|
|
||||||
|
Similarly, conditional loops:
|
||||||
|
|
||||||
|
```
|
||||||
|
loop-if-=
|
||||||
|
loop-if-= label
|
||||||
|
loop-if-!=
|
||||||
|
loop-if-!= label
|
||||||
|
|
||||||
|
loop-if-<
|
||||||
|
loop-if-< label
|
||||||
|
loop-if->
|
||||||
|
loop-if-> label
|
||||||
|
loop-if-<=
|
||||||
|
loop-if-<= label
|
||||||
|
loop-if->=
|
||||||
|
loop-if->= label
|
||||||
|
|
||||||
|
loop-if-addr<
|
||||||
|
loop-if-addr< label
|
||||||
|
loop-if-addr>
|
||||||
|
loop-if-addr> label
|
||||||
|
loop-if-addr<=
|
||||||
|
loop-if-addr<= label
|
||||||
|
loop-if-addr>=
|
||||||
|
loop-if-addr>= label
|
||||||
|
```
|
||||||
|
|
||||||
|
## Addresses
|
||||||
|
|
||||||
|
Passing objects by reference requires the `address` operation, which returns
|
||||||
|
an object of type `addr`.
|
||||||
|
|
||||||
|
```
|
||||||
|
var/reg: (addr T) <- address var2: T
|
||||||
|
```
|
||||||
|
|
||||||
|
Here `var2` can't live in a register.
|
||||||
|
|
||||||
|
## Array operations
|
||||||
|
|
||||||
|
Mu arrays are size-prefixed so that operations on them can check bounds as
|
||||||
|
necessary at run-time. The `length` statement returns the number of elements
|
||||||
|
in an array.
|
||||||
|
|
||||||
|
```
|
||||||
|
var/reg: int <- length arr/reg: (addr array T)
|
||||||
|
```
|
||||||
|
|
||||||
|
The `index` statement takes an `addr` to an `array` and returns an `addr` to
|
||||||
|
one of its elements, that can be read from or written to.
|
||||||
|
|
||||||
|
```
|
||||||
|
var/reg: (addr T) <- index arr/reg: (addr array T), n
|
||||||
|
var/reg: (addr T) <- index arr: (array T sz), n
|
||||||
|
```
|
||||||
|
|
||||||
|
The index can also be a variable in a register, with a caveat:
|
||||||
|
|
||||||
|
```
|
||||||
|
var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: int
|
||||||
|
var/reg: (addr T) <- index arr: (array T sz), idx/reg: int
|
||||||
|
```
|
||||||
|
|
||||||
|
The caveat: the size of T must be 1, 2, 4 or 8 bytes. The x86 instruction set
|
||||||
|
has complex addressing modes that can index into an array in a single instruction
|
||||||
|
in these situations.
|
||||||
|
|
||||||
|
For types in general you'll need to split up the work, performing a `compute-offset`
|
||||||
|
before the `index`.
|
||||||
|
|
||||||
|
```
|
||||||
|
var/reg: (offset T) <- compute-offset arr: (addr array T), idx/reg: int # arr can be in reg or mem
|
||||||
|
var/reg: (offset T) <- compute-offset arr: (addr array T), idx: int # arr can be in reg or mem
|
||||||
|
```
|
||||||
|
|
||||||
|
The `compute-offset` statement returns a value of type `(offset T)` after
|
||||||
|
performing any necessary bounds checking. Now the offset can be passed to
|
||||||
|
`index` as usual:
|
||||||
|
|
||||||
|
```
|
||||||
|
var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: (offset T)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Compound types
|
||||||
|
|
||||||
|
Primitive types can be combined together using the `type` keyword. For
|
||||||
|
example:
|
||||||
|
|
||||||
|
```
|
||||||
|
type point {
|
||||||
|
x: int
|
||||||
|
y: int
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Mu programs are currently sequences of `fn` and `type` definitions.
|
||||||
|
|
||||||
|
To access within a compound type, use the `get` instruction. There are two
|
||||||
|
forms. You need either a variable of the type itself (say `T`) in memory, or a
|
||||||
|
variable of type `(addr T)` in a register.
|
||||||
|
|
||||||
|
```
|
||||||
|
var/reg: (addr T_f) <- get var/reg: (addr T), f
|
||||||
|
var/reg: (addr T_f) <- get var: T, f
|
||||||
|
```
|
||||||
|
|
||||||
|
The `f` here is the field name from the `type` definition, and `T_f` must
|
||||||
|
match the type of `f` in the `type` definition. For example, some legal
|
||||||
|
instructions for the definition of `point` above:
|
||||||
|
|
||||||
|
```
|
||||||
|
var a/eax: (addr int) <- get p, x
|
||||||
|
var a/eax: (addr int) <- get p, y
|
||||||
|
```
|
||||||
|
|
||||||
|
## Handles for safe access to the heap
|
||||||
|
|
||||||
|
We've seen the `addr` type, but it's intended to be short-lived. In particular,
|
||||||
|
you can't save `addr` values inside compound `type`s. To do that you need a
|
||||||
|
"fat pointer" called a `handle` that is safe to keep around for extended
|
||||||
|
periods and ensures it's used safely without corrupting the heap and causing
|
||||||
|
security issues or hard-to-debug misbehavior.
|
||||||
|
|
||||||
|
To actually _use_ a `handle`, we have to turn it into an `addr` first using
|
||||||
|
the `lookup` statement.
|
||||||
|
|
||||||
|
```
|
||||||
|
var y/reg: (addr T) <- lookup x
|
||||||
|
```
|
||||||
|
|
||||||
|
Now operate on the `addr` as usual, safe in the knowledge that you can later
|
||||||
|
recover any writes to its payload from `x`.
|
||||||
|
|
||||||
|
It's illegal to continue to use this `addr` after a function that reclaims
|
||||||
|
heap memory. You have to repeat the lookup from the `handle`. (Luckily Mu
|
||||||
|
doesn't implement reclamation yet.)
|
||||||
|
|
||||||
|
Having two kinds of addresses takes some getting used to. Do we pass in
|
||||||
|
variables by value, by `addr` or by `handle`? In inputs or outputs? Here are 3
|
||||||
|
rules:
|
||||||
|
|
||||||
|
* Functions that need to look at the payload should accept an `(addr ...)`.
|
||||||
|
* Functions that need to treat a handle as a value, without looking at its
|
||||||
|
payload, should accept a `(handle ...)`. Helpers that save handles into data
|
||||||
|
structures are a common example.
|
||||||
|
* Functions that need to allocate memory should accept an `(addr handle
|
||||||
|
...)`.
|
||||||
|
|
||||||
|
Try to avoid mixing these use cases.
|
||||||
|
|
||||||
|
You can copy handles to another variable on the stack like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
var x: (handle T)
|
||||||
|
# ..some code initializing x..
|
||||||
|
var y/eax: (addr handle T) <- address ...
|
||||||
|
copy-handle x, y
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also save handles inside compound types like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
var y/reg: (addr handle T_f) <- get var: (addr T), f
|
||||||
|
copy-handle-to *y, x
|
||||||
|
```
|
||||||
|
|
||||||
|
Or this:
|
||||||
|
|
||||||
|
```
|
||||||
|
var y/reg: (addr handle T) <- index arr: (addr array handle T), n
|
||||||
|
copy-handle-to *y, x
|
||||||
|
```
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
Anything not allowed here is forbidden. At least until you modify mu.subx.
|
||||||
|
Please [contact me](mailto:ak@akkartik.com) or [report issues](https://github.com/akkartik/mu/issues)
|
||||||
|
when you encounter a missing or misleading error message.
|
259
mu_summary
259
mu_summary
|
@ -1,259 +0,0 @@
|
||||||
Mu programs are lists of functions. Each function has the following form:
|
|
||||||
|
|
||||||
fn _name_ _inouts_with_types_ -> _outputs_with_types_ {
|
|
||||||
_instructions_
|
|
||||||
}
|
|
||||||
|
|
||||||
Each function has a header line, and some number of instructions, each on a
|
|
||||||
separate line.
|
|
||||||
|
|
||||||
Instructions may be primitives or function calls. Either way, all instructions
|
|
||||||
have one of the following forms:
|
|
||||||
|
|
||||||
# defining variables
|
|
||||||
var _name_: _type_
|
|
||||||
var _name_/_register_: _type_
|
|
||||||
|
|
||||||
# doing things with variables
|
|
||||||
_operation_ _inouts_
|
|
||||||
_outputs_ <- _operation_ _inouts_
|
|
||||||
|
|
||||||
Instructions and functions may have inouts and outputs. Both inouts and
|
|
||||||
outputs are variables.
|
|
||||||
|
|
||||||
As seen above, variables can be defined to live in a register, like this:
|
|
||||||
|
|
||||||
n/eax
|
|
||||||
|
|
||||||
Variables not assigned a register live in the stack.
|
|
||||||
|
|
||||||
Function inouts must always be on the stack, and outputs must always be in
|
|
||||||
registers. A function call must always write to the exact registers its
|
|
||||||
definition requires. For example:
|
|
||||||
|
|
||||||
fn foo -> x/eax: int {
|
|
||||||
...
|
|
||||||
}
|
|
||||||
fn main {
|
|
||||||
a/eax <- foo # ok
|
|
||||||
a/ebx <- foo # wrong
|
|
||||||
}
|
|
||||||
|
|
||||||
Primitive inouts may be on the stack or in registers, but outputs must always
|
|
||||||
be in registers.
|
|
||||||
|
|
||||||
Functions can contain nested blocks inside { and }. Variables defined in a
|
|
||||||
block don't exist outside it.
|
|
||||||
|
|
||||||
{
|
|
||||||
_instructions_
|
|
||||||
{
|
|
||||||
_more instructions_
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
Blocks can be named like so:
|
|
||||||
|
|
||||||
$name: {
|
|
||||||
_instructions_
|
|
||||||
}
|
|
||||||
|
|
||||||
## Primitive instructions
|
|
||||||
|
|
||||||
Primitive instructions currently supported in Mu ('n' indicates a literal
|
|
||||||
integer rather than a variable, and 'var/reg' indicates a variable in a
|
|
||||||
register):
|
|
||||||
|
|
||||||
var/reg <- increment
|
|
||||||
increment var
|
|
||||||
var/reg <- decrement
|
|
||||||
decrement var
|
|
||||||
var1/reg1 <- add var2/reg2
|
|
||||||
var/reg <- add var2
|
|
||||||
add-to var1, var2/reg
|
|
||||||
var/reg <- add n
|
|
||||||
add-to var, n
|
|
||||||
|
|
||||||
var1/reg1 <- sub var2/reg2
|
|
||||||
var/reg <- sub var2
|
|
||||||
sub-from var1, var2/reg
|
|
||||||
var/reg <- sub n
|
|
||||||
sub-from var, n
|
|
||||||
|
|
||||||
var1/reg1 <- and var2/reg2
|
|
||||||
var/reg <- and var2
|
|
||||||
and-with var1, var2/reg
|
|
||||||
var/reg <- and n
|
|
||||||
and-with var, n
|
|
||||||
|
|
||||||
var1/reg1 <- or var2/reg2
|
|
||||||
var/reg <- or var2
|
|
||||||
or-with var1, var2/reg
|
|
||||||
var/reg <- or n
|
|
||||||
or-with var, n
|
|
||||||
|
|
||||||
var1/reg1 <- xor var2/reg2
|
|
||||||
var/reg <- xor var2
|
|
||||||
xor-with var1, var2/reg
|
|
||||||
var/reg <- xor n
|
|
||||||
xor-with var, n
|
|
||||||
|
|
||||||
var/reg <- copy var2/reg2
|
|
||||||
copy-to var1, var2/reg
|
|
||||||
var/reg <- copy var2
|
|
||||||
var/reg <- copy n
|
|
||||||
copy-to var, n
|
|
||||||
|
|
||||||
compare var1, var2/reg
|
|
||||||
compare var1/reg, var2
|
|
||||||
compare var/eax, n
|
|
||||||
compare var, n
|
|
||||||
|
|
||||||
var/reg <- multiply var2
|
|
||||||
|
|
||||||
Notice that there are no primitive instructions operating on two variables in
|
|
||||||
memory. That's a restriction of the underlying x86 processor.
|
|
||||||
|
|
||||||
Any instruction above that takes a variable in memory can be replaced with a
|
|
||||||
dereference (`*`) of an address variable in a register. But you can't dereference
|
|
||||||
variables in memory.
|
|
||||||
|
|
||||||
## Byte operations
|
|
||||||
|
|
||||||
A special-case is variables of type 'byte'. Mu is a 32-bit platform so for the
|
|
||||||
most part only supports types that are multiples of 32 bits. However, we do
|
|
||||||
want to support strings in ASCII and UTF-8, which will be arrays of bytes.
|
|
||||||
|
|
||||||
Since most x86 instructions implicitly load 32 bits at a time from memory,
|
|
||||||
variables of type 'byte' are only allowed in registers, not on the stack. Here
|
|
||||||
are the possible instructions for reading bytes to/from memory:
|
|
||||||
|
|
||||||
var/reg <- copy-byte var2/reg2 # var: byte, var2: byte
|
|
||||||
var/reg <- copy-byte *var2/reg2 # var: byte, var2: (addr byte)
|
|
||||||
copy-byte-to *var1/reg1, var2/reg2 # var1: (addr byte), var2: byte
|
|
||||||
|
|
||||||
In addition, variables of type 'byte' are restricted to (the lowest bytes of)
|
|
||||||
just 4 registers: eax, ecx, edx and ebx.
|
|
||||||
|
|
||||||
## Primitive jump instructions
|
|
||||||
|
|
||||||
There are two kinds of jumps, both with many variations: `break` and `loop`.
|
|
||||||
`break` instructions jump to the end of the containing block. `loop` instructions
|
|
||||||
jump to the beginning of the containing block.
|
|
||||||
|
|
||||||
Jumps can take an optional label starting with '$':
|
|
||||||
|
|
||||||
loop $foo
|
|
||||||
|
|
||||||
This instruction jumps to the beginning of the block called $foo. It must lie
|
|
||||||
somewhere inside such a block. Jumps are only legal to containing blocks. Use
|
|
||||||
named blocks with restraint; jumps to places far away can get confusing.
|
|
||||||
|
|
||||||
There are two unconditional jumps:
|
|
||||||
|
|
||||||
loop
|
|
||||||
loop label
|
|
||||||
break
|
|
||||||
break label
|
|
||||||
|
|
||||||
The remaining jump instructions are all conditional. Conditional jumps rely on
|
|
||||||
the result of the most recently executed `compare` instruction. (To keep
|
|
||||||
programs easy to read, keep compare instructions close to the jump that uses
|
|
||||||
them.)
|
|
||||||
|
|
||||||
break-if-=
|
|
||||||
break-if-= label
|
|
||||||
break-if-!=
|
|
||||||
break-if-!= label
|
|
||||||
|
|
||||||
Inequalities are similar, but have unsigned and signed variants. We assume
|
|
||||||
unsigned variants are only ever used to compare addresses.
|
|
||||||
|
|
||||||
break-if-<
|
|
||||||
break-if-< label
|
|
||||||
break-if->
|
|
||||||
break-if-> label
|
|
||||||
break-if-<=
|
|
||||||
break-if-<= label
|
|
||||||
break-if->=
|
|
||||||
break-if->= label
|
|
||||||
|
|
||||||
break-if-addr<
|
|
||||||
break-if-addr< label
|
|
||||||
break-if-addr>
|
|
||||||
break-if-addr> label
|
|
||||||
break-if-addr<=
|
|
||||||
break-if-addr<= label
|
|
||||||
break-if-addr>=
|
|
||||||
break-if-addr>= label
|
|
||||||
|
|
||||||
Similarly, conditional loops:
|
|
||||||
|
|
||||||
loop-if-=
|
|
||||||
loop-if-= label
|
|
||||||
loop-if-!=
|
|
||||||
loop-if-!= label
|
|
||||||
|
|
||||||
loop-if-<
|
|
||||||
loop-if-< label
|
|
||||||
loop-if->
|
|
||||||
loop-if-> label
|
|
||||||
loop-if-<=
|
|
||||||
loop-if-<= label
|
|
||||||
loop-if->=
|
|
||||||
loop-if->= label
|
|
||||||
|
|
||||||
loop-if-addr<
|
|
||||||
loop-if-addr< label
|
|
||||||
loop-if-addr>
|
|
||||||
loop-if-addr> label
|
|
||||||
loop-if-addr<=
|
|
||||||
loop-if-addr<= label
|
|
||||||
loop-if-addr>=
|
|
||||||
loop-if-addr>= label
|
|
||||||
|
|
||||||
## Address operations
|
|
||||||
|
|
||||||
var/reg: (addr T) <- address var: T # var must be in mem (on the stack)
|
|
||||||
|
|
||||||
## Array operations
|
|
||||||
|
|
||||||
var/reg: int <- length arr/reg: (addr array T)
|
|
||||||
var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: int
|
|
||||||
var/reg: (addr T) <- index arr: (array T sz), idx/reg: int
|
|
||||||
var/reg: (addr T) <- index arr/reg: (addr array T), n
|
|
||||||
var/reg: (addr T) <- index arr: (array T sz), n
|
|
||||||
|
|
||||||
var/reg: (offset T) <- compute-offset arr: (addr array T), idx/reg: int # arr can be in reg or mem
|
|
||||||
var/reg: (offset T) <- compute-offset arr: (addr array T), idx: int # arr can be in reg or mem
|
|
||||||
var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: (offset T)
|
|
||||||
|
|
||||||
## User-defined types
|
|
||||||
|
|
||||||
var/reg: (addr T_f) <- get var/reg: (addr T), f
|
|
||||||
where record (product) type T has elements a, b, c, ... of types T_a, T_b, T_c, ...
|
|
||||||
var/reg: (addr T_f) <- get var: T, f
|
|
||||||
|
|
||||||
## Handles for safe access to the heap
|
|
||||||
|
|
||||||
Say we created a handle like this on the stack (it can't be in a register)
|
|
||||||
var x: (handle T)
|
|
||||||
allocate Heap, T, x
|
|
||||||
|
|
||||||
You can copy handles to another variable on the stack like this:
|
|
||||||
var y: (handle T)
|
|
||||||
copy-handle-to y, x
|
|
||||||
|
|
||||||
You can also save handles inside other user-defined types like this:
|
|
||||||
var y/reg: (addr handle T_f) <- get var: (addr T), f
|
|
||||||
copy-handle-to *y, x
|
|
||||||
|
|
||||||
Or this:
|
|
||||||
var y/reg: (addr handle T) <- index arr: (addr array handle T), n
|
|
||||||
copy-handle-to *y, x
|
|
||||||
|
|
||||||
Handles can be converted into addresses like this:
|
|
||||||
var y/reg: (addr T) <- lookup x
|
|
||||||
|
|
||||||
It's illegal to continue to use this addr after a function that reclaims heap
|
|
||||||
memory. You have to repeat the lookup.
|
|
|
@ -0,0 +1,162 @@
|
||||||
|
## SubX
|
||||||
|
|
||||||
|
SubX is a notation for a subset of x86 machine code. [The Mu translator](http://akkartik.github.io/mu/html/apps/mu.subx.html)
|
||||||
|
is implemented in SubX and also emits SubX code.
|
||||||
|
|
||||||
|
Here's an example program in SubX that adds 1 and 1 and returns the result to
|
||||||
|
the parent shell process:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
== code
|
||||||
|
Entry:
|
||||||
|
# ebx = 1
|
||||||
|
bb/copy-to-ebx 1/imm32
|
||||||
|
# increment ebx
|
||||||
|
43/increment-ebx
|
||||||
|
# exit(ebx)
|
||||||
|
e8/call syscall_exit/disp32
|
||||||
|
```
|
||||||
|
|
||||||
|
## The syntax of SubX instructions
|
||||||
|
|
||||||
|
Just like in regular machine code, SubX programs consist mostly of instructions,
|
||||||
|
which are basically sequences of numbers (always in hex). Instructions consist
|
||||||
|
of words separated by whitespace. Words may be _opcodes_ (defining the
|
||||||
|
operation being performed) or _arguments_ (specifying the data the operation
|
||||||
|
acts on). Any word can have extra _metadata_ attached to it after `/`. Some
|
||||||
|
metadata is required (like the `/imm32` and `/imm8` above), but unrecognized
|
||||||
|
metadata is silently skipped so you can attach comments to words (like the
|
||||||
|
instruction name `/copy-to-eax` above, or the `/exit` argument).
|
||||||
|
|
||||||
|
What do all these numbers mean? SubX supports a small subset of the 32-bit x86
|
||||||
|
instruction set that likely runs on your computer. (Think of the name as short
|
||||||
|
for "sub-x86".) The instruction set contains instructions like `89/copy`,
|
||||||
|
`01/add`, `3d/compare` and `51/push-ecx` which modify registers and a byte-addressable
|
||||||
|
memory. For a complete list of supported instructions, run `bootstrap help
|
||||||
|
opcodes`.
|
||||||
|
|
||||||
|
The registers instructions operate on are as follows:
|
||||||
|
|
||||||
|
- Six general-purpose 32-bit registers: `0/eax`, `1/ebx`, `2/ecx`, `3/edx`,
|
||||||
|
`6/esi` and `7/edi`.
|
||||||
|
- Two additional 32-bit registers: `4/esp` and `5/ebp`. (I suggest you only
|
||||||
|
use these to manage the call stack.)
|
||||||
|
|
||||||
|
(SubX doesn't support floating-point registers yet. Intel processors support
|
||||||
|
an 8-bit mode, 16-bit mode and 64-bit mode. SubX will never support them.
|
||||||
|
There are also _many_ more instructions that SubX will never support.)
|
||||||
|
|
||||||
|
While SubX doesn't provide the usual mnemonics for opcodes, it _does_ provide
|
||||||
|
error-checking. If you miss an argument or accidentally add an extra argument,
|
||||||
|
you'll get a nice error. SubX won't arbitrarily interpret bytes of data as
|
||||||
|
instructions or vice versa.
|
||||||
|
|
||||||
|
It's worth distinguishing between an instruction's arguments and its _operands_.
|
||||||
|
Arguments are provided directly in instructions. Operands are pieces of data
|
||||||
|
in register or memory that are operated on by instructions.
|
||||||
|
|
||||||
|
Intel processors typically operate on no more than two operands, and at most
|
||||||
|
one of them (the 'reg/mem' operand) can access memory. The address of the
|
||||||
|
reg/mem operand is constructed by expressions of one of these forms:
|
||||||
|
|
||||||
|
- `%reg`: operate on just a register, not memory
|
||||||
|
- `*reg`: look up memory with the address in some register
|
||||||
|
- `*(reg + disp)`: add a constant to the address in some register
|
||||||
|
- `*(base + (index << scale) + disp)` where `base` and `index` are registers,
|
||||||
|
and `scale` and `disp` are 2- and 32-bit constants respectively.
|
||||||
|
|
||||||
|
Under the hood, SubX turns expressions of these forms into multiple arguments
|
||||||
|
with metadata in some complex ways. See [the doc on bare SubX](subx_bare.md).
|
||||||
|
|
||||||
|
That covers the complexities of the reg/mem operand. The second operand is
|
||||||
|
simpler. It comes from exactly one of the following argument types:
|
||||||
|
|
||||||
|
- `/r32`
|
||||||
|
- displacement: `/disp8` or `/disp32`
|
||||||
|
- immediate: `/imm8` or `/imm32`
|
||||||
|
|
||||||
|
Putting all this together, here's an example that adds the integer in `eax` to
|
||||||
|
the one at address `edx`:
|
||||||
|
|
||||||
|
```
|
||||||
|
01/add %edx 0/r32/eax
|
||||||
|
```
|
||||||
|
|
||||||
|
## The syntax of SubX programs
|
||||||
|
|
||||||
|
SubX programs map to the same ELF binaries that a conventional Linux system
|
||||||
|
uses. Linux ELF binaries consist of a series of _segments_. In particular, they
|
||||||
|
distinguish between code and data. Correspondingly, SubX programs consist of a
|
||||||
|
series of segments, each starting with a header line: `==` followed by a name
|
||||||
|
and approximate starting address.
|
||||||
|
|
||||||
|
All code must lie in a segment called 'code'.
|
||||||
|
|
||||||
|
Segments can be added to.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
== code 0x09000000 # first mention requires starting address
|
||||||
|
...A...
|
||||||
|
|
||||||
|
== data 0x0a000000
|
||||||
|
...B...
|
||||||
|
|
||||||
|
== code # no address necessary when adding
|
||||||
|
...C...
|
||||||
|
```
|
||||||
|
|
||||||
|
The `code` segment now contains the instructions of `A` as well as `C`.
|
||||||
|
|
||||||
|
Within the `code` segment, each line contains a comment, label or instruction.
|
||||||
|
Comments start with a `#` and are ignored. Labels should always be the first
|
||||||
|
word on a line, and they end with a `:`.
|
||||||
|
|
||||||
|
Instructions can refer to labels in displacement or immediate arguments, and
|
||||||
|
they'll obtain a value based on the address of the label: immediate arguments
|
||||||
|
will contain the address directly, while displacement arguments will contain
|
||||||
|
the difference between the address and the address of the current instruction.
|
||||||
|
The latter is mostly useful for `jump` and `call` instructions.
|
||||||
|
|
||||||
|
Functions are defined using labels. By convention, labels internal to functions
|
||||||
|
(that must only be jumped to) start with a `$`. Any other labels must only be
|
||||||
|
called, never jumped to. All labels must be unique.
|
||||||
|
|
||||||
|
Functions are called using the following syntax:
|
||||||
|
```
|
||||||
|
(func arg1 arg2 ...)
|
||||||
|
```
|
||||||
|
|
||||||
|
Function arguments must be either literals (integers or strings) or a reg/mem
|
||||||
|
operand using the syntax in the previous section.
|
||||||
|
|
||||||
|
A special label is `Entry`, which can be used to specify/override the entry
|
||||||
|
point of the program. It doesn't have to be unique, and the latest definition
|
||||||
|
will override earlier ones.
|
||||||
|
|
||||||
|
(The `Entry` label, along with duplicate segment headers, allows programs to
|
||||||
|
be built up incrementally out of multiple [_layers_](http://akkartik.name/post/wart-layers).)
|
||||||
|
|
||||||
|
Another special pair of labels are the block delimiters `{` and `}`. They can
|
||||||
|
be nested, and jump instructions can take arguments `loop` or `break` that
|
||||||
|
jump to the enclosing `{` and `}` respectively.
|
||||||
|
|
||||||
|
The data segment consists of labels as before and byte values. Referring to
|
||||||
|
data labels in either `code` segment instructions or `data` segment values
|
||||||
|
yields their address.
|
||||||
|
|
||||||
|
Automatic tests are an important part of SubX, and there's a simple mechanism
|
||||||
|
to provide a test harness: all functions that start with `test-` are called in
|
||||||
|
turn by a special, auto-generated function called `run-tests`. How you choose
|
||||||
|
to call it is up to you.
|
||||||
|
|
||||||
|
I try to keep things simple so that there's less work to do when implementing
|
||||||
|
SubX in SubX. But there _is_ one convenience: instructions can provide a
|
||||||
|
string literal surrounded by quotes (`"`) in an `imm32` argument. SubX will
|
||||||
|
transparently copy it to the `data` segment and replace it with its address.
|
||||||
|
Strings are the only place where a SubX word is allowed to contain spaces.
|
||||||
|
|
||||||
|
That should be enough information for writing SubX programs. The `apps/`
|
||||||
|
directory provides some fodder for practice in the `apps/ex*.subx` files,
|
||||||
|
giving a more gradual introduction to SubX features. In particular, you should
|
||||||
|
work through `apps/factorial4.subx`, which demonstrates all the above ideas in
|
||||||
|
concert.
|
|
@ -1,7 +1,7 @@
|
||||||
[The Readme](Readme.md) describes SubX notation with some details hidden
|
[The SubX documentation](subx.md) describes SubX notation with some details
|
||||||
behind _syntax sugar_ -- local rewrite rules that make programming in SubX
|
hidden behind _syntax sugar_ -- local rewrite rules that make programming in
|
||||||
less error-prone. However, much low-level SubX (before the syntax sugar is
|
SubX less error-prone. However, much low-level SubX (before the syntax sugar
|
||||||
implemented) is written without syntax sugar. This document describes some
|
is implemented) is written without syntax sugar. This document describes some
|
||||||
details of the syntax sugar: how the reg/mem operand is translated into
|
details of the syntax sugar: how the reg/mem operand is translated into
|
||||||
arguments.
|
arguments.
|
||||||
|
|
||||||
|
@ -123,3 +123,50 @@ $ echo $?
|
||||||
|
|
||||||
These details should now be enough information for reading and modifying
|
These details should now be enough information for reading and modifying
|
||||||
low-level SubX programs.
|
low-level SubX programs.
|
||||||
|
|
||||||
|
## Translating SubX programs
|
||||||
|
|
||||||
|
This repo includes two translators for bare SubX. The first is [the bootstrap
|
||||||
|
translator](bootstrap.md) implemented in C++. In addition, you can use SubX to
|
||||||
|
translate itself. For example, running natively on Linux:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# generate translator phases using the C++ translator
|
||||||
|
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/hex.subx -o hex
|
||||||
|
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/survey.subx -o survey
|
||||||
|
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/pack.subx -o pack
|
||||||
|
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/assort.subx -o assort
|
||||||
|
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/dquotes.subx -o dquotes
|
||||||
|
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/tests.subx -o tests
|
||||||
|
$ chmod +x hex survey pack assort dquotes tests
|
||||||
|
|
||||||
|
# use the generated translator phases to translate SubX programs
|
||||||
|
$ cat init.linux apps/ex1.subx |./tests |./dquotes |./assort |./pack |./survey |./hex > a.elf
|
||||||
|
$ chmod +x a.elf
|
||||||
|
$ ./a.elf
|
||||||
|
$ echo $?
|
||||||
|
42
|
||||||
|
|
||||||
|
# or, automating the above steps
|
||||||
|
$ ./translate_subx init.linux apps/ex1.subx
|
||||||
|
$ ./a.elf
|
||||||
|
$ echo $?
|
||||||
|
42
|
||||||
|
```
|
||||||
|
|
||||||
|
Or, running in a VM on other platforms (much slower):
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ ./translate_subx_emulated init.linux apps/ex1.subx # generates identical a.elf to above
|
||||||
|
$ ./bootstrap run a.elf
|
||||||
|
$ echo $?
|
||||||
|
42
|
||||||
|
```
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
- [Single-page cheatsheet for the x86 ISA](https://net.cs.uni-bonn.de/fileadmin/user_upload/plohmann/x86_opcode_structure_and_instruction_overview.pdf)
|
||||||
|
(pdf; [cached local copy](https://github.com/akkartik/mu/blob/master/cheatsheet.pdf))
|
||||||
|
- [Concise reference for the x86 ISA](https://c9x.me/x86)
|
||||||
|
- [Concise reference for the x86 ISA #2](http://ref.x86asm.net/coder32.html)
|
||||||
|
- [Intel processor manual](http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf) (pdf)
|
15
test_apps
15
test_apps
|
@ -47,11 +47,11 @@ echo ex2
|
||||||
test "$1" = 'record' || git diff --exit-code apps/ex2
|
test "$1" = 'record' || git diff --exit-code apps/ex2
|
||||||
test $EMULATED && {
|
test $EMULATED && {
|
||||||
./bootstrap run apps/ex2 || ret=$?
|
./bootstrap run apps/ex2 || ret=$?
|
||||||
test $ret -eq 2 # 1 + 1
|
test $ret -eq 7 # 3 + 4
|
||||||
}
|
}
|
||||||
test $NATIVE && {
|
test $NATIVE && {
|
||||||
apps/ex2 || ret=$?
|
apps/ex2 || ret=$?
|
||||||
test $ret -eq 2 # 1 + 1
|
test $ret -eq 7 # 3 + 4
|
||||||
}
|
}
|
||||||
|
|
||||||
echo ex3
|
echo ex3
|
||||||
|
@ -365,17 +365,6 @@ test $NATIVE && {
|
||||||
test $ret -eq 7
|
test $ret -eq 7
|
||||||
}
|
}
|
||||||
|
|
||||||
echo ex2.2.mu
|
|
||||||
./translate_mu apps/ex2.2.mu
|
|
||||||
test $EMULATED && {
|
|
||||||
./bootstrap run a.elf || ret=$?
|
|
||||||
test $ret -eq 4
|
|
||||||
}
|
|
||||||
test $NATIVE && {
|
|
||||||
./a.elf || ret=$?
|
|
||||||
test $ret -eq 4
|
|
||||||
}
|
|
||||||
|
|
||||||
echo ex3.mu
|
echo ex3.mu
|
||||||
./translate_mu apps/ex3.mu
|
./translate_mu apps/ex3.mu
|
||||||
test $EMULATED && {
|
test $EMULATED && {
|
||||||
|
|
|
@ -0,0 +1,208 @@
|
||||||
|
## Reference documentation on available primitives
|
||||||
|
|
||||||
|
### Data Structures
|
||||||
|
|
||||||
|
- Kernel strings: null-terminated regions of memory. Unsafe and to be avoided,
|
||||||
|
but needed for interacting with the kernel.
|
||||||
|
|
||||||
|
- Arrays: length-prefixed regions of memory containing multiple elements of a
|
||||||
|
single type. Contents are preceded by 4 bytes (32 bits) containing the
|
||||||
|
`length` of the array in bytes.
|
||||||
|
|
||||||
|
- Slices: a pair of 32-bit addresses denoting a [half-open](https://en.wikipedia.org/wiki/Interval_(mathematics))
|
||||||
|
\[`start`, `end`) interval to live memory with a consistent lifetime.
|
||||||
|
|
||||||
|
Invariant: `start` <= `end`
|
||||||
|
|
||||||
|
- Streams: strings prefixed by 32-bit `write` and `read` indexes that the next
|
||||||
|
write or read goes to, respectively.
|
||||||
|
|
||||||
|
- offset 0: write index
|
||||||
|
- offset 4: read index
|
||||||
|
- offset 8: length of array (in bytes)
|
||||||
|
- offset 12: start of array data
|
||||||
|
|
||||||
|
Invariant: 0 <= `read` <= `write` <= `length`
|
||||||
|
|
||||||
|
- File descriptors (fd): Low-level 32-bit integers that the kernel uses to
|
||||||
|
track files opened by the program.
|
||||||
|
|
||||||
|
- File: 32-bit value containing either a fd or an address to a stream (fake
|
||||||
|
file).
|
||||||
|
|
||||||
|
- Buffered files (buffered-file): Contain a file descriptor and a stream for
|
||||||
|
buffering reads/writes. Each `buffered-file` must exclusively perform either
|
||||||
|
reads or writes.
|
||||||
|
|
||||||
|
### 'system calls'
|
||||||
|
|
||||||
|
As I said at the top, a primary design goal of SubX (and Mu more broadly) is
|
||||||
|
to explore ways to turn arbitrary manual tests into reproducible automated
|
||||||
|
tests. SubX aims for this goal by baking testable interfaces deep into the
|
||||||
|
stack, at the OS syscall level. The idea is that every syscall that interacts
|
||||||
|
with hardware (and so the environment) should be *dependency injected* so that
|
||||||
|
it's possible to insert fake hardware in tests.
|
||||||
|
|
||||||
|
But those are big goals. Here are the syscalls I have so far:
|
||||||
|
|
||||||
|
- `write`: takes two arguments, a file `f` and an address to array `s`.
|
||||||
|
|
||||||
|
Comparing this interface with the Unix `write()` syscall shows two benefits:
|
||||||
|
|
||||||
|
1. SubX can handle 'fake' file descriptors in tests.
|
||||||
|
|
||||||
|
1. `write()` accepts buffer and its length in separate arguments, which
|
||||||
|
requires callers to manage the two separately and so can be error-prone.
|
||||||
|
SubX's wrapper keeps the two together to increase the chances that we
|
||||||
|
never accidentally go out of array bounds.
|
||||||
|
|
||||||
|
- `read`: takes two arguments, a file `f` and an address to stream `s`. Reads
|
||||||
|
as much data from `f` as can fit in (the free space of) `s`.
|
||||||
|
|
||||||
|
Like with `write()`, this wrapper around the Unix `read()` syscall adds the
|
||||||
|
ability to handle 'fake' file descriptors in tests, and reduces the chances
|
||||||
|
of clobbering outside array bounds.
|
||||||
|
|
||||||
|
One bit of weirdness here: in tests we do a redundant copy from one stream
|
||||||
|
to another. See [the comments before the implementation](http://akkartik.github.io/mu/html/060read.subx.html)
|
||||||
|
for a discussion of alternative interfaces.
|
||||||
|
|
||||||
|
- `stop`: takes two arguments:
|
||||||
|
- `ed` is an address to an _exit descriptor_. Exit descriptors allow us to
|
||||||
|
`exit()` the program in production, but return to the test harness within
|
||||||
|
tests. That allows tests to make assertions about when `exit()` is called.
|
||||||
|
- `value` is the status code to `exit()` with.
|
||||||
|
|
||||||
|
For more details on exit descriptors and how to create one, see [the
|
||||||
|
comments before the implementation](http://akkartik.github.io/mu/html/059stop.subx.html).
|
||||||
|
|
||||||
|
- `new-segment`
|
||||||
|
|
||||||
|
Allocates a whole new segment of memory for the program, discontiguous with
|
||||||
|
both existing code and data (heap) segments. Just a more opinionated form of
|
||||||
|
[`mmap`](http://man7.org/linux/man-pages/man2/mmap.2.html).
|
||||||
|
|
||||||
|
- `allocate`: takes two arguments, an address to allocation-descriptor `ad`
|
||||||
|
and an integer `n`
|
||||||
|
|
||||||
|
Allocates a contiguous range of memory that is guaranteed to be exclusively
|
||||||
|
available to the caller. Returns the starting address to the range in `eax`.
|
||||||
|
|
||||||
|
An allocation descriptor tracks allocated vs available addresses in some
|
||||||
|
contiguous range of memory. The int specifies the number of bytes to allocate.
|
||||||
|
|
||||||
|
Explicitly passing in an allocation descriptor allows for nested memory
|
||||||
|
management, where a sub-system gets a chunk of memory and further parcels it
|
||||||
|
out to individual allocations. Particularly helpful for (surprise) tests.
|
||||||
|
|
||||||
|
- ... _(to be continued)_
|
||||||
|
|
||||||
|
I will continue to import syscalls over time from [the old Mu VM in the parent
|
||||||
|
directory](https://github.com/akkartik/mu), which has experimented with
|
||||||
|
interfaces for the screen, keyboard, mouse, disk and network.
|
||||||
|
|
||||||
|
### primitives built atop system calls
|
||||||
|
|
||||||
|
_(Compound arguments are usually passed in by reference. Where the results are
|
||||||
|
compound objects that don't fit in a register, the caller usually passes in
|
||||||
|
allocated memory for it.)_
|
||||||
|
|
||||||
|
#### assertions for tests
|
||||||
|
- `check-ints-equal`: fails current test if given ints aren't equal
|
||||||
|
- `check-stream-equal`: fails current test if stream doesn't match string
|
||||||
|
- `check-next-stream-line-equal`: fails current test if next line of stream
|
||||||
|
until newline doesn't match string
|
||||||
|
|
||||||
|
#### error handling
|
||||||
|
- `error`: takes three arguments, an exit-descriptor, a file and a string (message)
|
||||||
|
|
||||||
|
Prints out the message to the file and then exits using the provided
|
||||||
|
exit-descriptor.
|
||||||
|
|
||||||
|
- `error-byte`: like `error` but takes an extra byte value that it prints out
|
||||||
|
at the end of the message.
|
||||||
|
|
||||||
|
#### predicates
|
||||||
|
- `kernel-string-equal?`: compares a kernel string with a string
|
||||||
|
- `string-equal?`: compares two strings
|
||||||
|
- `stream-data-equal?`: compares a stream with a string
|
||||||
|
- `next-stream-line-equal?`: compares with string the next line in a stream, from
|
||||||
|
`read` index to newline
|
||||||
|
|
||||||
|
- `slice-empty?`: checks if the `start` and `end` of a slice are equal
|
||||||
|
- `slice-equal?`: compares a slice with a string
|
||||||
|
- `slice-starts-with?`: compares the start of a slice with a string
|
||||||
|
- `slice-ends-with?`: compares the end of a slice with a string
|
||||||
|
|
||||||
|
#### writing to disk
|
||||||
|
- `write`: string -> file
|
||||||
|
- Can also be used to cat a string into a stream.
|
||||||
|
- Will abort the entire program if destination is a stream and doesn't have
|
||||||
|
enough room.
|
||||||
|
- `write-stream`: stream -> file
|
||||||
|
- Can also be used to cat one stream into another.
|
||||||
|
- Will abort the entire program if destination is a stream and doesn't have
|
||||||
|
enough room.
|
||||||
|
- `write-slice`: slice -> stream
|
||||||
|
- Will abort the entire program if there isn't enough room in the
|
||||||
|
destination stream.
|
||||||
|
- `append-byte`: int -> stream
|
||||||
|
- Will abort the entire program if there isn't enough room in the
|
||||||
|
destination stream.
|
||||||
|
- `append-byte-hex`: int -> stream
|
||||||
|
- textual representation in hex, no '0x' prefix
|
||||||
|
- Will abort the entire program if there isn't enough room in the
|
||||||
|
destination stream.
|
||||||
|
- `print-int32`: int -> stream
|
||||||
|
- textual representation in hex, including '0x' prefix
|
||||||
|
- Will abort the entire program if there isn't enough room in the
|
||||||
|
destination stream.
|
||||||
|
- `write-buffered`: string -> buffered-file
|
||||||
|
- `write-slice-buffered`: slice -> buffered-file
|
||||||
|
- `flush`: buffered-file
|
||||||
|
- `write-byte-buffered`: int -> buffered-file
|
||||||
|
- `print-byte-buffered`: int -> buffered-file
|
||||||
|
- textual representation in hex, no '0x' prefix
|
||||||
|
- `print-int32-buffered`: int -> buffered-file
|
||||||
|
- textual representation in hex, including '0x' prefix
|
||||||
|
|
||||||
|
#### reading from disk
|
||||||
|
- `read`: file -> stream
|
||||||
|
- Can also be used to cat one stream into another.
|
||||||
|
- Will silently stop reading when destination runs out of space.
|
||||||
|
- `read-byte-buffered`: buffered-file -> byte
|
||||||
|
- `read-line-buffered`: buffered-file -> stream
|
||||||
|
- Will abort the entire program if there isn't enough room.
|
||||||
|
|
||||||
|
#### non-IO operations on streams
|
||||||
|
- `new-stream`: allocates space for a stream of `n` elements, each occupying
|
||||||
|
`b` bytes.
|
||||||
|
- Will abort the entire program if `n*b` requires more than 32 bits.
|
||||||
|
- `clear-stream`: resets everything in the stream to `0` (except its `length`).
|
||||||
|
- `rewind-stream`: resets the read index of the stream to `0` without modifying
|
||||||
|
its contents.
|
||||||
|
|
||||||
|
#### reading/writing hex representations of integers
|
||||||
|
- `is-hex-int?`: takes a slice argument, returns boolean result in `eax`
|
||||||
|
- `parse-hex-int`: takes a slice argument, returns int result in `eax`
|
||||||
|
- `is-hex-digit?`: takes a 32-bit word containing a single byte, returns
|
||||||
|
boolean result in `eax`.
|
||||||
|
- `from-hex-char`: takes a hexadecimal digit character in `eax`, returns its
|
||||||
|
numeric value in `eax`
|
||||||
|
- `to-hex-char`: takes a single-digit numeric value in `eax`, returns its
|
||||||
|
corresponding hexadecimal character in `eax`
|
||||||
|
|
||||||
|
#### tokenization
|
||||||
|
|
||||||
|
from a stream:
|
||||||
|
- `next-token`: stream, delimiter byte -> slice
|
||||||
|
- `skip-chars-matching`: stream, delimiter byte
|
||||||
|
- `skip-chars-not-matching`: stream, delimiter byte
|
||||||
|
|
||||||
|
from a slice:
|
||||||
|
- `next-token-from-slice`: start, end, delimiter byte -> slice
|
||||||
|
- Given a slice and a delimiter byte, returns a new slice inside the input
|
||||||
|
that ends at the delimiter byte.
|
||||||
|
|
||||||
|
- `skip-chars-matching-in-slice`: curr, end, delimiter byte -> new-curr (in `eax`)
|
||||||
|
- `skip-chars-not-matching-in-slice`: curr, end, delimiter byte -> new-curr (in `eax`)
|
Loading…
Reference in New Issue