mu/mu.md

610 lines
15 KiB
Markdown
Raw Normal View History

2021-10-24 06:38:14 +00:00
# Mu reference
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Mu programs are sequences of `fn` and `type` definitions.
## Functions
Define functions with the `fn` keyword. For example:
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
fn foo arg1: int, arg2: int -> result/eax: boolean
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
Functions contain `{}` blocks, `var` declarations, primitive statements and
calls to other functions. Primitive statements and function calls look
similar:
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
```
out1, out2, out3, ... <- operation inout1, inout2, inout3, ...
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
They can take any number of inouts and outputs, including 0. Statements
with 0 outputs also drop the `<-`.
2020-10-14 17:58:15 +00:00
2021-10-24 06:38:14 +00:00
Inouts can be either variables in memory, variables in registers, or
constants. Outputs are always variables in registers.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Inouts in memory can be either inputs or outputs (if they're addresses being
written to). Hence the name.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Primitives can often write to arbitrary output registers. User-defined
functions, however, require rigidly specified output registers.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
## Variables, registers and memory
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Declare local variables in a function using the `var` keyword.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
You can declare local variables in either registers or memory (the stack). So
a `var` statement has two forms:
- `var x/eax: int <- copy 0`
- `var x: int`
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Variables in registers must be initialized. Variables on the stack are
implicitly zeroed out.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Variables can be in six 32-bit _general-purpose_ registers of the x86 processor.
- eax
- ebx
- ecx
- edx
- esi ('s' often a mnemonic for 'source')
- edi ('d' often a mnemonic for 'destination')
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
You can store several types in these registers:
- int
- boolean
- (addr T) (address into memory)
- byte (uses only 8 bits)
- code-point (Unicode)
- grapheme (code-point encoded in UTF-8)
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
There's one 32-bit type you _cannot_ store in these registers:
- float
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
It instead uses eight separate 32-bit registers: xmm0, xmm1, ..., xmm7
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Types that require more than 32 bits (4 bytes) cannot be stored in registers:
- (array T)
- (handle T)
- (stream T)
- slice
- any compound types you define using the `type` keyword
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
`T` here can be any type, including combinations of types. For example:
- (array int) -- an array of ints
- (addr int) -- an address to an int
- (handle int) -- a handle to an int
- (addr handle int) -- an address to a handle to int
- (addr array handle int) -- an address to an array of handles to ints
- ...and so on.
2021-10-24 06:38:14 +00:00
Other miscellaneous restrictions:
- `byte` variables must be either in registers or on the heap, never local
variables on the stack.
- `addr` variables can never "escape" a function either by being returned or
by being written to a memory location. When you need that sort of thing,
use a `handle` instead.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
## Primitive statement types
2021-02-07 17:53:14 +00:00
2021-10-24 06:38:14 +00:00
These usually operate on variables with 32-bit types, with some restrictions
noted below. Most instructions with multiple args require types to match.
2021-02-07 17:53:14 +00:00
2021-10-24 06:38:14 +00:00
Notation in this section:
- `var/reg` indicates a variable in a register
- `var/xreg` indicates a variable in a floating-point register
- `var` without a `reg` indicates either a variable on the stack or
dereferencing a variable in a (non-floating-point) register: `*var/reg`
- `n` indicates a literal integer. There are no floating-point literals.
2021-02-07 17:53:14 +00:00
2021-10-24 06:38:14 +00:00
### Moving values around
2021-02-07 17:53:14 +00:00
2021-10-24 06:38:14 +00:00
These instructions work with variables of any 32-bit type except `addr` and
`float`.
```
2021-10-24 06:38:14 +00:00
var/reg <- copy var2/reg2
copy-to var1, var2/reg
var/reg <- copy var2
var/reg <- copy n
copy-to var, n
```
2021-10-24 06:38:14 +00:00
Byte variables have their own instructions:
```
2021-10-24 06:38:14 +00:00
var/reg <- copy-byte var2/reg2
var/reg <- copy-byte *var2/reg2 # var2 must have type (addr byte)
copy-byte-to *var1/reg1, var2/reg2 # var1 must have type (addr byte)
```
2021-10-24 06:38:14 +00:00
Floating point instructions can be copied as well, but only to floating-point
registers `xmm_`.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
```
var/xreg <- copy var2/xreg2
copy-to var1, var2/xreg
var/xreg <- copy var2
var/xreg <- copy *var2/reg2 # var2 must have type (addr byte) and live in a general-purpose register
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
There's no way to copy a literal to a floating-point register. However,
there's a few ways to convert non-float values in general-purpose registers.
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
var/xreg <- convert var2/reg2
var/xreg <- convert var2
var/xreg <- convert *var2/reg2
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Correspondingly, there are ways to convert floats into integers.
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
var/reg <- convert var2/xreg2
var/reg <- convert var2
var/reg <- convert *var2/reg2
var/reg <- truncate var2/xreg2
var/reg <- truncate var2
var/reg <- truncate *var2/reg2
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
### Comparing values
Work with variables of any 32-bit type. `addr` variables can only be compared
to 0.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
compare var1, var2/reg
compare var1/reg, var2
compare var/eax, n
compare var, n
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Floating-point numbers cannot be compared to literals, and the register must
come first.
compare var1/xreg1, var2/xreg2
compare var1/xreg1, var2
### Branches
Immediately after a `compare` instruction you can branch on its result. For
example:
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
break-if-=
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
This instruction will jump to after the enclosing `{}` block if the previous
`compare` detected equality. Here's the list of conditional and unconditional
`break` instructions:
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
```
break
break-if-=
break-if-!=
break-if-<
break-if->
break-if-<=
break-if->=
```
2020-07-06 18:13:12 +00:00
2021-10-24 06:38:14 +00:00
Similarly, you can jump back to the start of the enclosing `{}` block with
`loop`. Here's the list of `loop` instructions.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
```
loop
loop-if-=
loop-if-!=
loop-if-<
loop-if->
loop-if-<=
loop-if->=
```
2021-10-24 06:38:14 +00:00
Additionally, there are special variants for comparing `addr` and `float`
values, which results in the following comprehensive list of jumps:
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
```
break
break-if-=
break-if-!=
break-if-< break-if-addr< break-if-float<
break-if-> break-if-addr> break-if-float>
break-if-<= break-if-addr<= break-if-float<=
break-if->= break-if-addr>= break-if-float>=
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
loop
loop-if-=
loop-if-!=
loop-if-< loop-if-addr< loop-if-float<
loop-if-> loop-if-addr> loop-if-float>
loop-if-<= loop-if-addr<= loop-if-float<=
loop-if->= loop-if-addr>= loop-if-float>=
```
2021-10-20 21:42:57 +00:00
2021-10-24 06:38:14 +00:00
One final property all these jump instructions share: they can take an
optional block name to jump to. For example:
```
a: {
...
break a #----------|
... # |
} # <--|
2021-10-24 06:38:14 +00:00
a: { # <--|
... # |
b: { # |
... # |
loop a #----------|
...
}
...
}
```
2020-10-04 08:25:02 +00:00
2021-10-24 06:38:14 +00:00
However, there's no way to jump to a block that doesn't contain the `loop` or
`break` instruction.
2021-10-24 06:38:14 +00:00
### Integer arithmetic
2021-10-24 06:38:14 +00:00
These instructions require variables of non-`addr`, non-float types.
2021-10-24 06:38:14 +00:00
Add:
```
var1/reg1 <- add var2/reg2
var/reg <- add var2
add-to var1, var2/reg # var1 += var2
var/reg <- add n
add-to var, n
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Subtract:
```
var1/reg1 <- subtract var2/reg2
var/reg <- subtract var2
subtract-from var1, var2/reg # var1 -= var2
var/reg <- subtract n
subtract-from var, n
2021-05-17 04:48:24 +00:00
```
2021-10-24 06:38:14 +00:00
Add one:
```
var/reg <- increment
increment var
```
2021-05-17 04:48:24 +00:00
2021-10-24 06:38:14 +00:00
Subtract one:
```
var/reg <- decrement
decrement var
```
2021-10-23 16:30:49 +00:00
2021-10-24 06:38:14 +00:00
Multiply:
```
var/reg <- multiply var2
2021-05-17 04:48:24 +00:00
```
2021-10-24 06:38:14 +00:00
The result of a multiply must be a register.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Negate:
```
var1/reg1 <- negate
negate var
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
### Floating-point arithmetic
2021-10-24 06:38:14 +00:00
Operations on `float` variables include a few we've seen before and some new
ones. Notice here that we mostly use floating-point registers `xmm_`, but
still use the general-purpose registers when dereferencing variables of type
`(addr float)`.
```
2021-10-24 06:38:14 +00:00
var/xreg <- add var2/xreg2
var/xreg <- add var2
var/xreg <- add *var2/reg2
2021-10-24 06:38:14 +00:00
var/xreg <- subtract var2/xreg2
var/xreg <- subtract var2
var/xreg <- subtract *var2/reg2
2021-10-24 06:38:14 +00:00
var/xreg <- multiply var2/xreg2
var/xreg <- multiply var2
var/xreg <- multiply *var2/reg2
2021-10-24 06:38:14 +00:00
var/xreg <- divide var2/xreg2
var/xreg <- divide var2
var/xreg <- divide *var2/reg2
2021-10-24 06:38:14 +00:00
var/xreg <- reciprocal var2/xreg2
var/xreg <- reciprocal var2
var/xreg <- reciprocal *var2/reg2
2021-10-24 06:38:14 +00:00
var/xreg <- square-root var2/xreg2
var/xreg <- square-root var2
var/xreg <- square-root *var2/reg2
2021-10-24 06:38:14 +00:00
var/xreg <- inverse-square-root var2/xreg2
var/xreg <- inverse-square-root var2
var/xreg <- inverse-square-root *var2/reg2
2021-10-24 06:38:14 +00:00
var/xreg <- min var2/xreg2
var/xreg <- min var2
var/xreg <- min *var2/reg2
2021-10-24 06:38:14 +00:00
var/xreg <- max var2/xreg2
var/xreg <- max var2
var/xreg <- max *var2/reg2
2020-10-04 05:53:05 +00:00
```
2020-10-04 06:17:17 +00:00
Two instructions in the above list are approximate. According to the Intel
2020-10-14 17:45:25 +00:00
manual, `reciprocal` and `inverse-square-root` [go off the rails around the
2021-10-24 06:38:14 +00:00
fourth decimal place](linux/x86_approx.md). If you need more precision, use
`divide` separately.
2020-10-04 06:17:17 +00:00
2021-10-24 06:38:14 +00:00
### Bitwise boolean operations
2021-10-24 06:38:14 +00:00
These require variables of non-`addr`, non-float types.
2021-10-24 06:38:14 +00:00
And:
```
var1/reg1 <- and var2/reg2
var/reg <- and var2
and-with var1, var2/reg
var/reg <- and n
and-with var, n
2020-10-04 05:53:05 +00:00
```
2021-10-24 06:38:14 +00:00
Or:
2020-10-04 05:53:05 +00:00
```
2021-10-24 06:38:14 +00:00
var1/reg1 <- or var2/reg2
var/reg <- or var2
or-with var1, var2/reg
var/reg <- or n
or-with var, n
2020-10-04 05:53:05 +00:00
```
2021-10-24 06:38:14 +00:00
Not:
2020-10-04 05:53:05 +00:00
```
2021-10-24 06:38:14 +00:00
var1/reg1 <- not
not var
```
2021-10-24 06:38:14 +00:00
Xor:
```
var1/reg1 <- xor var2/reg2
var/reg <- xor var2
xor-with var1, var2/reg
var/reg <- xor n
xor-with var, n
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
### Shifts
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Shifts require variables of non-`addr`, non-float types.
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
var/reg <- shift-left n
var/reg <- shift-right n
var/reg <- shift-right-signed n
shift-left var, n
shift-right var, n
shift-right-signed var, n
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Shifting bits left always inserts zeros on the right.
Shifting bits right inserts zeros on the left by default.
A _signed_ shift right duplicates the leftmost bit, thereby preserving the
sign of an integer.
## More complex instructions on more complex types
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
These instructions work with any type `T`. As before we use `/reg` here to
indicate when a variable must live in a register. We also include type
constraints after a `:`.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
### Addresses and handles
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
You can compute the address of any variable in memory (never in registers):
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
var/reg: (addr T) <- address var2: T
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
As mentioned up top, `addr` variables can never escape the function where
they're computed. You can't store them on the heap, or in compound types.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
To manage long-lived addresses, _allocate_ them on the heap.
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
allocate var: (addr handle T) # var can be in either register or memory
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Handles can be copied and stored without restriction. However, they're too
large to fit in a register. You also can't access their payload directly, you
have to first convert them into a short-lived `addr` using _lookup_.
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
var y/eax: (addr T) <- lookup x: (handle T)
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
The output of `lookup` is always returned in register `eax`. Many other
function calls do the same thing. In practice, this means `eax` ends up being
a temporary location used to store lots of variables in quick succession.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
Since handles are large compound types, there's a special helper for comparing
them:
```
2021-10-24 06:38:14 +00:00
var/eax: boolean <- handle-equal? var1: (handle T), var2: (handle T)
```
2021-10-24 06:38:14 +00:00
### Arrays
2020-10-14 17:45:25 +00:00
2021-10-24 06:38:14 +00:00
Arrays are declared in two ways:
1. On the stack with a literal size:
```
2021-10-24 06:38:14 +00:00
var x: (array int 3)
```
2021-10-24 06:38:14 +00:00
2. On the heap with a potentially variable size. For example:
```
2021-10-24 06:38:14 +00:00
var x: (handle array int)
var x-ah/eax: (addr handle array int) <- address x
populate x-ah, 8
```
The `8` here can also be an int in a register or memory.
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
You can compute the length of an array, though you'll need an `addr` to do so:
2020-11-16 06:54:56 +00:00
```
2021-10-24 06:38:14 +00:00
var/reg: int <- length arr/reg: (addr array T)
2020-11-16 06:54:56 +00:00
```
2021-10-24 06:38:14 +00:00
To read from or write to an array, use `index` to first obtain an address to
read from or modify:
2020-11-16 06:54:56 +00:00
```
2021-10-24 06:38:14 +00:00
var/reg: (addr T) <- index arr/reg: (addr array T), n
var/reg: (addr T) <- index arr: (array T len), n
2020-11-16 06:54:56 +00:00
```
2021-10-24 06:38:14 +00:00
Like our notation of `n`, `len` here is required to be a literal.
The index requested can also be a variable in a register, with one caveat:
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: int
var/reg: (addr T) <- index arr: (array T len), idx/reg: int
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
The caveat: the size of T must be 1, 2, 4 or 8 bytes. For other sizes of T
you'll need to split up the work, performing a `compute-offset` before the
`index`.
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
var/reg: (offset T) <- compute-offset arr: (addr array T), idx/reg: int # arr can be in reg or mem
var/reg: (offset T) <- compute-offset arr: (addr array T), idx: int # arr can be in reg or mem
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
The result of a `compute-offset` statement can be passed to `index`:
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: (offset T)
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
### Stream operations
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
A common use for arrays is as buffers. Save a few items to a scratch space and
then process them. This pattern is so common (we use it in files) that there's
special support for it with a built-in type: `stream`.
Streams are like arrays in many ways. You can initialize them with a length on
the stack:
2020-07-05 22:28:37 +00:00
```
2021-10-24 06:38:14 +00:00
var x: (stream int 3)
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
You can also populate them on the heap:
```
2021-10-24 06:38:14 +00:00
var x: (handle stream int)
var x-ah/eax: (addr handle stream int) <- address x
populate-stream x-ah, 8
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
However, streams don't provide random access with an `index` instruction.
Instead, you write to them sequentially, and read back what you wrote.
2020-11-16 06:54:56 +00:00
2021-10-24 06:38:14 +00:00
```
read-from-stream s: (addr stream T), out: (addr T)
write-to-stream s: (addr stream T), in: (addr T)
```
2020-11-16 06:54:56 +00:00
2021-10-24 06:38:14 +00:00
Streams of bytes are particularly common for managing Unicode text, and there
are a few functions to help with them:
2020-11-16 06:54:56 +00:00
```
2021-10-24 06:38:14 +00:00
write s: (addr stream byte), u: (addr array byte) # write u to s, abort if full
overflow?/eax: boolean <- try-write s: (addr stream byte), u: (addr array byte)
write-stream dest: (addr stream byte), src: (addr stream byte)
# bytes
append-byte s: (addr stream byte), var: int # write lower byte of var
var/eax: byte <- read-byte s: (addr stream byte)
# 32-bit graphemes encoded in UTF-8
write-grapheme out: (addr stream byte), g: grapheme
g/eax: grapheme <- read-grapheme in: (addr stream byte)
2020-11-16 06:54:56 +00:00
```
2021-10-24 06:38:14 +00:00
You can check if a stream is empty or full:
2020-11-16 06:54:56 +00:00
```
2021-10-24 06:38:14 +00:00
var/eax: boolean <- stream-empty? s: (addr stream)
var/eax: boolean <- stream-full? s: (addr stream)
2020-11-16 06:54:56 +00:00
```
You can clear streams:
```
2021-10-24 06:38:14 +00:00
clear-stream f: (addr stream T)
2020-11-16 06:54:56 +00:00
```
2021-10-24 06:38:14 +00:00
You can also rewind them to reread their contents:
2020-11-16 06:54:56 +00:00
```
2021-10-24 06:38:14 +00:00
rewind-stream f: (addr stream T)
2020-11-16 06:54:56 +00:00
```
2020-07-05 22:28:37 +00:00
## Compound types
Primitive types can be combined together using the `type` keyword. For
example:
```
type point {
x: int
y: int
}
```
2020-07-05 22:28:37 +00:00
Mu programs are currently sequences of `fn` and `type` definitions.
2020-11-16 08:30:58 +00:00
Compound types can't include `addr` types for safety reasons (use `handle` instead,
2020-10-14 17:45:25 +00:00
which is described below). They also can't currently include `array`, `stream`
or `byte` types. Since arrays and streams carry their size with them, supporting
them in compound types complicates variable initialization. Instead of
defining them inline in a type definition, define a `handle` to them. Bytes
2021-10-24 06:38:14 +00:00
shouldn't be used for anything but UTF-8 strings.
2020-09-29 10:41:05 +00:00
2020-07-05 22:28:37 +00:00
To access within a compound type, use the `get` instruction. There are two
forms. You need either a variable of the type itself (say `T`) in memory, or a
variable of type `(addr T)` in a register.
```
var/reg: (addr T_f) <- get var/reg: (addr T), f
var/reg: (addr T_f) <- get var: T, f
```
2020-07-05 22:28:37 +00:00
2020-10-14 17:45:25 +00:00
The `f` here is the field name from the `type` definition, and its type `T_f`
must match the type of `f` in the `type` definition. For example, some legal
2020-07-05 22:28:37 +00:00
instructions for the definition of `point` above:
```
var a/eax: (addr int) <- get p, x
var a/eax: (addr int) <- get p, y
```
2020-07-05 22:28:37 +00:00
2021-10-24 06:38:14 +00:00
You can clear compound types using the `clear-object` function:
2020-11-16 08:30:58 +00:00
```
clear-object var: (addr T)
```
2021-10-24 06:38:14 +00:00
You can shallow-copy compound types using the `copy-object` function:
2020-11-16 08:30:58 +00:00
```
copy-object src: (addr T), dest: (addr T)
```