Re-sync markdown files with mu-normie fork.
This commit is contained in:
Kartik Agaram 2020-07-12 16:37:58 -07:00
parent ac8b37f9a8
commit 7817fdb29c
5 changed files with 265 additions and 259 deletions

View File

@ -7,14 +7,14 @@ Mu is not designed to operate in large clusters providing services for
millions of people. Mu is designed for _you_, to run one computer. (Or a few.)
Running the code you want to run, and nothing else.
```sh
$ git clone https://github.com/akkartik/mu
$ cd mu
$ ./translate_mu apps/ex2.mu # emit a.elf
$ ./a.elf # adds 3 and 4
$ echo $?
7
```
```sh
$ git clone https://github.com/akkartik/mu
$ cd mu
$ ./translate_mu apps/ex2.mu # emit a.elf
$ ./a.elf # adds 3 and 4
$ echo $?
7
```
[![Build Status](https://api.travis-ci.org/akkartik/mu.svg?branch=master)](https://travis-ci.org/akkartik/mu)
@ -82,26 +82,26 @@ result in good error messages.
Once generated, ELF binaries can be packaged up with a Linux kernel into a
bootable disk image:
```sh
$ ./translate_mu apps/ex2.mu # emit a.elf
# dependencies
$ sudo apt install build-essential flex bison wget libelf-dev libssl-dev xorriso
$ tools/iso/linux a.elf
$ qemu-system-x86_64 -m 256M -cdrom mu_linux.iso -boot d
```
```sh
$ ./translate_mu apps/ex2.mu # emit a.elf
# dependencies
$ sudo apt install build-essential flex bison wget libelf-dev libssl-dev xorriso
$ tools/iso/linux a.elf
$ qemu-system-x86_64 -m 256M -cdrom mu_linux.iso -boot d
```
The disk image also runs on [any cloud server that supports custom images](http://akkartik.name/post/iso-on-linode).
Mu also runs on the minimal hobbyist OS [Soso](https://github.com/ozkl/soso).
(Requires graphics and sudo access. Currently doesn't work on a cloud server.)
```sh
$ ./translate_mu apps/ex2.mu # emit a.elf
# dependencies
$ sudo apt install build-essential util-linux nasm xorriso # maybe also dosfstools and mtools
$ tools/iso/soso a.elf # requires sudo
$ qemu-system-i386 -cdrom mu_soso.iso
```
```sh
$ ./translate_mu apps/ex2.mu # emit a.elf
# dependencies
$ sudo apt install build-essential util-linux nasm xorriso # maybe also dosfstools and mtools
$ tools/iso/soso a.elf # requires sudo
$ qemu-system-i386 -cdrom mu_soso.iso
```
## Syntax
@ -121,16 +121,16 @@ Here's an example program in Mu:
Here's an example program in SubX:
```sh
== code
Entry:
# ebx = 1
bb/copy-to-ebx 1/imm32
# increment ebx
43/increment-ebx
# exit(ebx)
e8/call syscall_exit/disp32
```
```sh
== code
Entry:
# ebx = 1
bb/copy-to-ebx 1/imm32
# increment ebx
43/increment-ebx
# exit(ebx)
e8/call syscall_exit/disp32
```
[More details on SubX syntax →](subx.md)

368
mu.md
View File

@ -46,13 +46,13 @@ Zooming out from single statements, here's a complete sample program in Mu:
Mu programs are lists of functions. Each function has the following form:
```
fn _name_ _inout_ ... -> _output_ ... {
_statement_
_statement_
...
}
```
```
fn _name_ _inout_ ... -> _output_ ... {
_statement_
_statement_
...
}
```
Each function has a header line, and some number of statements, each on a
separate line. Headers describe inouts and outputs. Inouts can't be registers,
@ -64,15 +64,15 @@ outputs in registers, and modify inouts passed in by reference. In addition,
there's one more constraint: output registers must match the function header.
For example:
```
fn f -> x/eax: int {
...
}
fn g {
a/eax <- f # ok
a/ebx <- f # wrong
}
```
```
fn f -> x/eax: int {
...
}
fn g {
a/eax <- f # ok
a/ebx <- f # wrong
}
```
The function `main` is special; it is where the program starts running. It
must always return a single int in register `ebx` (as the exit status of the
@ -92,23 +92,23 @@ and `}`, both each alone on a line.
Blocks can nest:
```
```
{
_statements_
{
_statements_
{
_more statements_
}
_more statements_
}
```
}
```
Blocks can be named (with the name ending in a `:` on the same line as the
`{`):
```
$name: {
_statements_
}
```
```
$name: {
_statements_
}
```
Further down we'll see primitive statements for skipping or repeating blocks.
Besides control flow, the other use for blocks is...
@ -119,10 +119,10 @@ Functions can define new variables at any time with the keyword `var`. There
are two variants of the `var` statement, for defining variables in registers
or memory.
```
var name: type
var name/reg: type <- ...
```
```
var name: type
var name/reg: type <- ...
```
Variables on the stack are never initialized. (They're always implicitly
zeroed them out.) Variables in registers are always initialized.
@ -142,54 +142,54 @@ Here is the list of arithmetic primitive operations supported by Mu. The name
`n` indicates a literal integer rather than a variable, and `var/reg` indicates
a variable in a register.
```
var/reg <- increment
increment var
var/reg <- decrement
decrement var
var1/reg1 <- add var2/reg2
var/reg <- add var2
add-to var1, var2/reg
var/reg <- add n
add-to var, n
```
var/reg <- increment
increment var
var/reg <- decrement
decrement var
var1/reg1 <- add var2/reg2
var/reg <- add var2
add-to var1, var2/reg
var/reg <- add n
add-to var, n
var1/reg1 <- sub var2/reg2
var/reg <- sub var2
sub-from var1, var2/reg
var/reg <- sub n
sub-from var, n
var1/reg1 <- sub var2/reg2
var/reg <- sub var2
sub-from var1, var2/reg
var/reg <- sub n
sub-from var, n
var1/reg1 <- and var2/reg2
var/reg <- and var2
and-with var1, var2/reg
var/reg <- and n
and-with var, n
var1/reg1 <- and var2/reg2
var/reg <- and var2
and-with var1, var2/reg
var/reg <- and n
and-with var, n
var1/reg1 <- or var2/reg2
var/reg <- or var2
or-with var1, var2/reg
var/reg <- or n
or-with var, n
var1/reg1 <- or var2/reg2
var/reg <- or var2
or-with var1, var2/reg
var/reg <- or n
or-with var, n
var1/reg1 <- xor var2/reg2
var/reg <- xor var2
xor-with var1, var2/reg
var/reg <- xor n
xor-with var, n
var1/reg1 <- xor var2/reg2
var/reg <- xor var2
xor-with var1, var2/reg
var/reg <- xor n
xor-with var, n
var/reg <- copy var2/reg2
copy-to var1, var2/reg
var/reg <- copy var2
var/reg <- copy n
copy-to var, n
var/reg <- copy var2/reg2
copy-to var1, var2/reg
var/reg <- copy var2
var/reg <- copy n
copy-to var, n
compare var1, var2/reg
compare var1/reg, var2
compare var/eax, n
compare var, n
compare var1, var2/reg
compare var1/reg, var2
compare var/eax, n
compare var, n
var/reg <- multiply var2
```
var/reg <- multiply var2
```
Any statement above that takes a variable in memory can be replaced with a
dereference (`*`) of an address variable (of type `(addr ...)`) in a register.
@ -211,11 +211,11 @@ Since most x86 instructions implicitly load 32 bits at a time from memory,
variables of type 'byte' are only allowed in registers, not on the stack. Here
are the possible statements for reading bytes to/from memory:
```
var/reg <- copy-byte var2/reg2 # var: byte, var2: byte
var/reg <- copy-byte *var2/reg2 # var: byte, var2: (addr byte)
copy-byte-to *var1/reg1, var2/reg2 # var1: (addr byte), var2: byte
```
```
var/reg <- copy-byte var2/reg2 # var: byte, var2: byte
var/reg <- copy-byte *var2/reg2 # var: byte, var2: (addr byte)
copy-byte-to *var1/reg1, var2/reg2 # var1: (addr byte), var2: byte
```
In addition, variables of type 'byte' are restricted to (the lowest bytes of)
just 4 registers: eax, ecx, edx and ebx.
@ -228,9 +228,9 @@ jump to the beginning of the containing block.
All jumps can take an optional label starting with '$':
```
loop $foo
```
```
loop $foo
```
This instruction jumps to the beginning of the block called $foo. The corresponding
`break` jumps to the end of the block. Either jump statement must lie somewhere
@ -239,83 +239,83 @@ blocks with restraint; jumps to places far away can get confusing.)
There are two unconditional jumps:
```
loop
loop label
break
break label
```
```
loop
loop label
break
break label
```
The remaining jump instructions are all conditional. Conditional jumps rely on
the result of the most recently executed `compare` instruction. (To keep
programs easy to read, keep compare instructions close to the jump that uses
them.)
```
break-if-=
break-if-= label
break-if-!=
break-if-!= label
```
```
break-if-=
break-if-= label
break-if-!=
break-if-!= label
```
Inequalities are similar, but have unsigned and signed variants. For simplicity,
always use signed integers; use the unsigned variants only to compare addresses.
```
break-if-<
break-if-< label
break-if->
break-if-> label
break-if-<=
break-if-<= label
break-if->=
break-if->= label
```
break-if-<
break-if-< label
break-if->
break-if-> label
break-if-<=
break-if-<= label
break-if->=
break-if->= label
break-if-addr<
break-if-addr< label
break-if-addr>
break-if-addr> label
break-if-addr<=
break-if-addr<= label
break-if-addr>=
break-if-addr>= label
```
break-if-addr<
break-if-addr< label
break-if-addr>
break-if-addr> label
break-if-addr<=
break-if-addr<= label
break-if-addr>=
break-if-addr>= label
```
Similarly, conditional loops:
```
loop-if-=
loop-if-= label
loop-if-!=
loop-if-!= label
```
loop-if-=
loop-if-= label
loop-if-!=
loop-if-!= label
loop-if-<
loop-if-< label
loop-if->
loop-if-> label
loop-if-<=
loop-if-<= label
loop-if->=
loop-if->= label
loop-if-<
loop-if-< label
loop-if->
loop-if-> label
loop-if-<=
loop-if-<= label
loop-if->=
loop-if->= label
loop-if-addr<
loop-if-addr< label
loop-if-addr>
loop-if-addr> label
loop-if-addr<=
loop-if-addr<= label
loop-if-addr>=
loop-if-addr>= label
```
loop-if-addr<
loop-if-addr< label
loop-if-addr>
loop-if-addr> label
loop-if-addr<=
loop-if-addr<= label
loop-if-addr>=
loop-if-addr>= label
```
## Addresses
Passing objects by reference requires the `address` operation, which returns
an object of type `addr`.
```
var/reg: (addr T) <- address var2: T
```
```
var/reg: (addr T) <- address var2: T
```
Here `var2` can't live in a register.
@ -325,24 +325,24 @@ Mu arrays are size-prefixed so that operations on them can check bounds as
necessary at run-time. The `length` statement returns the number of elements
in an array.
```
var/reg: int <- length arr/reg: (addr array T)
```
```
var/reg: int <- length arr/reg: (addr array T)
```
The `index` statement takes an `addr` to an `array` and returns an `addr` to
one of its elements, that can be read from or written to.
```
var/reg: (addr T) <- index arr/reg: (addr array T), n
var/reg: (addr T) <- index arr: (array T sz), n
```
```
var/reg: (addr T) <- index arr/reg: (addr array T), n
var/reg: (addr T) <- index arr: (array T sz), n
```
The index can also be a variable in a register, with a caveat:
```
var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: int
var/reg: (addr T) <- index arr: (array T sz), idx/reg: int
```
```
var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: int
var/reg: (addr T) <- index arr: (array T sz), idx/reg: int
```
The caveat: the size of T must be 1, 2, 4 or 8 bytes. The x86 instruction set
has complex addressing modes that can index into an array in a single instruction
@ -351,30 +351,30 @@ in these situations.
For types in general you'll need to split up the work, performing a `compute-offset`
before the `index`.
```
var/reg: (offset T) <- compute-offset arr: (addr array T), idx/reg: int # arr can be in reg or mem
var/reg: (offset T) <- compute-offset arr: (addr array T), idx: int # arr can be in reg or mem
```
```
var/reg: (offset T) <- compute-offset arr: (addr array T), idx/reg: int # arr can be in reg or mem
var/reg: (offset T) <- compute-offset arr: (addr array T), idx: int # arr can be in reg or mem
```
The `compute-offset` statement returns a value of type `(offset T)` after
performing any necessary bounds checking. Now the offset can be passed to
`index` as usual:
```
var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: (offset T)
```
```
var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: (offset T)
```
## Compound types
Primitive types can be combined together using the `type` keyword. For
example:
```
type point {
x: int
y: int
}
```
```
type point {
x: int
y: int
}
```
Mu programs are currently sequences of `fn` and `type` definitions.
@ -382,19 +382,19 @@ To access within a compound type, use the `get` instruction. There are two
forms. You need either a variable of the type itself (say `T`) in memory, or a
variable of type `(addr T)` in a register.
```
var/reg: (addr T_f) <- get var/reg: (addr T), f
var/reg: (addr T_f) <- get var: T, f
```
```
var/reg: (addr T_f) <- get var/reg: (addr T), f
var/reg: (addr T_f) <- get var: T, f
```
The `f` here is the field name from the `type` definition, and `T_f` must
match the type of `f` in the `type` definition. For example, some legal
instructions for the definition of `point` above:
```
var a/eax: (addr int) <- get p, x
var a/eax: (addr int) <- get p, y
```
```
var a/eax: (addr int) <- get p, x
var a/eax: (addr int) <- get p, y
```
## Handles for safe access to the heap
@ -407,9 +407,9 @@ security issues or hard-to-debug misbehavior.
To actually _use_ a `handle`, we have to turn it into an `addr` first using
the `lookup` statement.
```
var y/reg: (addr T) <- lookup x
```
```
var y/reg: (addr T) <- lookup x
```
Now operate on the `addr` as usual, safe in the knowledge that you can later
recover any writes to its payload from `x`.
@ -433,26 +433,26 @@ Try to avoid mixing these use cases.
You can copy handles to another variable on the stack like this:
```
var x: (handle T)
# ..some code initializing x..
var y/eax: (addr handle T) <- address ...
copy-handle x, y
```
```
var x: (handle T)
# ..some code initializing x..
var y/eax: (addr handle T) <- address ...
copy-handle x, y
```
You can also save handles inside compound types like this:
```
var y/reg: (addr handle T_f) <- get var: (addr T), f
copy-handle-to *y, x
```
```
var y/reg: (addr handle T_f) <- get var: (addr T), f
copy-handle-to *y, x
```
Or this:
```
var y/reg: (addr handle T) <- index arr: (addr array handle T), n
copy-handle-to *y, x
```
```
var y/reg: (addr handle T) <- index arr: (addr array handle T), n
copy-handle-to *y, x
```
## Conclusion

26
subx.md
View File

@ -6,16 +6,16 @@ is implemented in SubX and also emits SubX code.
Here's an example program in SubX that adds 1 and 1 and returns the result to
the parent shell process:
```sh
== code
Entry:
# ebx = 1
bb/copy-to-ebx 1/imm32
# increment ebx
43/increment-ebx
# exit(ebx)
e8/call syscall_exit/disp32
```
```sh
== code
Entry:
# ebx = 1
bb/copy-to-ebx 1/imm32
# increment ebx
43/increment-ebx
# exit(ebx)
e8/call syscall_exit/disp32
```
## The syntax of SubX instructions
@ -78,9 +78,9 @@ simpler. It comes from exactly one of the following argument types:
Putting all this together, here's an example that adds the integer in `eax` to
the one at address `edx`:
```
01/add %edx 0/r32/eax
```
```
01/add %edx 0/r32/eax
```
## The syntax of SubX programs

View File

@ -41,9 +41,9 @@ contains `4`. Rather than encoding register `esp`, it means the address is
provided by three _whole new_ arguments (`/base`, `/index` and `/scale`) in a
_totally_ different way (where `<<` is the left-shift operator):
```
reg/mem = *(base + (index << scale))
```
```
reg/mem = *(base + (index << scale))
```
(There are a couple more exceptions ☹; see [Table 2-2](modrm.pdf) and [Table 2-3](sib.pdf)
of the Intel manual for the complete story.)
@ -130,38 +130,38 @@ This repo includes two translators for bare SubX. The first is [the bootstrap
translator](bootstrap.md) implemented in C++. In addition, you can use SubX to
translate itself. For example, running natively on Linux:
```sh
# generate translator phases using the C++ translator
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/hex.subx -o hex
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/survey.subx -o survey
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/pack.subx -o pack
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/assort.subx -o assort
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/dquotes.subx -o dquotes
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/tests.subx -o tests
$ chmod +x hex survey pack assort dquotes tests
```sh
# generate translator phases using the C++ translator
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/hex.subx -o hex
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/survey.subx -o survey
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/pack.subx -o pack
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/assort.subx -o assort
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/dquotes.subx -o dquotes
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/tests.subx -o tests
$ chmod +x hex survey pack assort dquotes tests
# use the generated translator phases to translate SubX programs
$ cat init.linux apps/ex1.subx |./tests |./dquotes |./assort |./pack |./survey |./hex > a.elf
$ chmod +x a.elf
$ ./a.elf
$ echo $?
42
# use the generated translator phases to translate SubX programs
$ cat init.linux apps/ex1.subx |./tests |./dquotes |./assort |./pack |./survey |./hex > a.elf
$ chmod +x a.elf
$ ./a.elf
$ echo $?
42
# or, automating the above steps
$ ./translate_subx init.linux apps/ex1.subx
$ ./a.elf
$ echo $?
42
```
# or, automating the above steps
$ ./translate_subx init.linux apps/ex1.subx
$ ./a.elf
$ echo $?
42
```
Or, running in a VM on other platforms (much slower):
```sh
$ ./translate_subx_emulated init.linux apps/ex1.subx # generates identical a.elf to above
$ ./bootstrap run a.elf
$ echo $?
42
```
```sh
$ ./translate_subx_emulated init.linux apps/ex1.subx # generates identical a.elf to above
$ ./bootstrap run a.elf
$ echo $?
42
```
## Resources

View File

@ -25,20 +25,24 @@ rudimentary but hopefully still workable toolkit:
- Generate a trace for the failing test while running your program in emulated
mode (`bootstrap run`):
```
$ ./bootstrap translate input.subx -o binary
$ ./bootstrap --trace run binary arg1 arg2 2>trace
```
The ability to generate a trace is the essential reason for the existence of
`bootstrap run` mode. It gives far better visibility into program internals than
running natively.
- As a further refinement, it is possible to render label names in the trace
by adding a second flag to the `bootstrap translate` command:
```
$ ./bootstrap --debug translate input.subx -o binary
$ ./bootstrap --trace run binary arg1 arg2 2>trace
```
`bootstrap --debug translate` emits a mapping from label to address in a file
called `labels`. `bootstrap --trace run` reads in the `labels` file if
it exists and prints out any matching label name as it traces each instruction
@ -57,9 +61,11 @@ rudimentary but hopefully still workable toolkit:
some loop. Function names get similar `run: == label` lines.
- One trick when emitting traces with labels:
```
$ grep label trace
```
This is useful for quickly showing you the control flow for the run, and the
function executing when the error occurred. I find it useful to start with
this information, only looking at the complete trace after I've gotten