2020-07-05 22:28:37 +00:00
|
|
|
# Mu Syntax
|
|
|
|
|
|
|
|
Here are two valid statements in Mu:
|
|
|
|
|
|
|
|
```
|
|
|
|
increment x
|
|
|
|
y <- increment
|
|
|
|
```
|
|
|
|
|
|
|
|
Understanding when to use one vs the other is the critical idea in Mu. In
|
|
|
|
short, the former increments a value in memory, while the latter increments a
|
|
|
|
value in a register.
|
|
|
|
|
|
|
|
Most languages start from some syntax and do what it takes to implement it.
|
2020-11-16 07:51:40 +00:00
|
|
|
Mu, however, is designed as a safe way to program in [a regular subset of
|
2020-07-05 22:28:37 +00:00
|
|
|
32-bit x86 machine code](subx.md), _satisficing_ rather than optimizing for a
|
|
|
|
clean syntax. To keep the mapping to machine code lightweight, Mu exclusively
|
2020-10-14 16:47:05 +00:00
|
|
|
uses statements. Most statements map to a single instruction of machine code.
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Since the x86 instruction set restricts how many memory locations an instruction
|
|
|
|
can use, Mu makes registers explicit as well. Variables must be explicitly
|
2020-10-14 17:58:15 +00:00
|
|
|
mapped to specific registers; otherwise they live in memory. While you have to
|
2020-11-16 07:51:40 +00:00
|
|
|
do your own register allocation, Mu will helpfully point out when you get it
|
2020-10-14 17:58:15 +00:00
|
|
|
wrong.
|
|
|
|
|
2020-07-05 22:28:37 +00:00
|
|
|
Statements consist of 3 parts: the operation, optional _inouts_ and optional
|
|
|
|
_outputs_. Outputs come before the operation name and `<-`.
|
|
|
|
|
|
|
|
Outputs are always registers; memory locations that need to be modified are
|
2020-10-14 16:48:24 +00:00
|
|
|
passed in by reference in inouts.
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
So Mu programmers need to make two new categories of decisions: whether to
|
|
|
|
define variables in registers or memory, and whether to put variables to the
|
|
|
|
left or right. There's always exactly one way to write any given operation. In
|
|
|
|
return for this overhead you get a lightweight and future-proof stack. And Mu
|
2020-10-14 16:48:24 +00:00
|
|
|
will provide good error messages to support you.
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Further down, this page enumerates all available primitives in Mu, and [a
|
|
|
|
separate page](http://akkartik.github.io/mu/html/mu_instructions.html)
|
2020-11-16 06:54:56 +00:00
|
|
|
describes how each primitive is translated to machine code. There is also a
|
|
|
|
useful list of pre-defined functions (implemented in unsafe machine code) in [400.mu](http://akkartik.github.io/mu/html/400.mu.html)
|
|
|
|
and [vocabulary.md](vocabulary.md).
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
## Functions and calls
|
|
|
|
|
|
|
|
Zooming out from single statements, here's a complete sample program in Mu:
|
|
|
|
|
|
|
|
<img alt='ex2.mu' src='html/ex2.mu.png'>
|
|
|
|
|
|
|
|
Mu programs are lists of functions. Each function has the following form:
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
fn _name_ _inout_ ... -> _output_ ... {
|
|
|
|
_statement_
|
|
|
|
_statement_
|
|
|
|
...
|
|
|
|
}
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Each function has a header line, and some number of statements, each on a
|
|
|
|
separate line. Headers describe inouts and outputs. Inouts can't be registers,
|
2020-11-16 07:51:40 +00:00
|
|
|
and outputs _must_ be registers. Outputs can't take names. In the above
|
|
|
|
example, the outputs of both `do-add` and `main` have type `int` and are
|
|
|
|
available in register `ebx` at the end of the respective calls.
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
The above program also demonstrates a function call (to the function `do-add`).
|
|
|
|
Function calls look the same as primitive statements: they can return (multiple)
|
|
|
|
outputs in registers, and modify inouts passed in by reference. In addition,
|
|
|
|
there's one more constraint: output registers must match the function header.
|
|
|
|
For example:
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
2020-11-16 07:51:40 +00:00
|
|
|
fn f -> _/eax: int {
|
2020-07-12 23:37:58 +00:00
|
|
|
...
|
|
|
|
}
|
|
|
|
fn g {
|
|
|
|
a/eax <- f # ok
|
2020-10-14 18:06:25 +00:00
|
|
|
a/ebx <- f # wrong; `a` must be in register `eax`
|
2020-07-12 23:37:58 +00:00
|
|
|
}
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-11-02 02:01:11 +00:00
|
|
|
You can exit a function at any time with the `return` instruction. Give it the
|
|
|
|
right number of arguments, and it'll assign them respectively to the function's
|
|
|
|
outputs before jumping back to the caller.
|
|
|
|
|
2020-07-05 22:28:37 +00:00
|
|
|
The function `main` is special; it is where the program starts running. It
|
|
|
|
must always return a single int in register `ebx` (as the exit status of the
|
|
|
|
process). It can also optionally accept an array of strings as input (from the
|
|
|
|
shell command-line). To be precise, `main` must have one of the following
|
|
|
|
two signatures:
|
|
|
|
|
2020-11-16 07:51:40 +00:00
|
|
|
- `fn main -> _/ebx: int`
|
|
|
|
- `fn main args: (addr array (addr array byte)) -> _/ebx: int`
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-10-14 18:06:25 +00:00
|
|
|
(The names of the inout and output are flexible.)
|
|
|
|
|
|
|
|
Mu encloses multi-word types in parentheses, and types can get quite expressive.
|
|
|
|
For example, you read `main`'s inout type as "an address to an array of
|
|
|
|
addresses to arrays of bytes." Since addresses to arrays of bytes are almost
|
|
|
|
always strings in Mu, you'll quickly learn to mentally shorten this type to
|
|
|
|
"an address to an array of strings".
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
## Blocks
|
|
|
|
|
|
|
|
Blocks are useful for grouping related statements. They're delimited by `{`
|
|
|
|
and `}`, both each alone on a line.
|
|
|
|
|
|
|
|
Blocks can nest:
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
{
|
|
|
|
_statements_
|
2020-07-05 22:28:37 +00:00
|
|
|
{
|
2020-07-12 23:37:58 +00:00
|
|
|
_more statements_
|
2020-07-05 22:28:37 +00:00
|
|
|
}
|
2020-07-12 23:37:58 +00:00
|
|
|
}
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Blocks can be named (with the name ending in a `:` on the same line as the
|
|
|
|
`{`):
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
$name: {
|
|
|
|
_statements_
|
|
|
|
}
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Further down we'll see primitive statements for skipping or repeating blocks.
|
|
|
|
Besides control flow, the other use for blocks is...
|
|
|
|
|
|
|
|
## Local variables
|
|
|
|
|
|
|
|
Functions can define new variables at any time with the keyword `var`. There
|
|
|
|
are two variants of the `var` statement, for defining variables in registers
|
|
|
|
or memory.
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
var name: type
|
|
|
|
var name/reg: type <- ...
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Variables on the stack are never initialized. (They're always implicitly
|
2020-10-14 17:45:25 +00:00
|
|
|
zeroed out.) Variables in registers are always initialized.
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-11-16 07:51:40 +00:00
|
|
|
Register variables can go in 6 integer registers (`eax`, `ebx`, `ecx`, `edx`,
|
|
|
|
`esi`, `edi`) or 8 floating-point registers (`xmm0`, `xmm1`, `xmm2`, `xmm3`,
|
|
|
|
`xmm4`, `xmm5`, `xmm6`, `xmm7`).
|
2020-09-29 05:19:43 +00:00
|
|
|
|
|
|
|
Defining a variable in a register either clobbers the previous variable (if it
|
|
|
|
was defined in the same block) or shadows it temporarily (if it was defined in
|
|
|
|
an outer block).
|
2020-07-06 18:13:12 +00:00
|
|
|
|
2020-07-05 22:28:37 +00:00
|
|
|
Variables exist from their definition until the end of their containing block.
|
|
|
|
Register variables may also die earlier if their register is clobbered by a
|
|
|
|
new variable.
|
|
|
|
|
2020-10-14 17:45:25 +00:00
|
|
|
Variables on the stack can be of many types (but not `byte`). Integer registers
|
|
|
|
can only contain 32-bit values: `int`, `byte`, `boolean`, `(addr ...)`. Floating-point
|
|
|
|
registers can only contain values of type `float`.
|
2020-09-29 05:19:43 +00:00
|
|
|
|
|
|
|
## Integer primitives
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Here is the list of arithmetic primitive operations supported by Mu. The name
|
|
|
|
`n` indicates a literal integer rather than a variable, and `var/reg` indicates
|
2020-10-14 17:45:25 +00:00
|
|
|
a variable in a register, though that's not always valid Mu syntax.
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
var/reg <- increment
|
|
|
|
increment var
|
|
|
|
var/reg <- decrement
|
|
|
|
decrement var
|
|
|
|
var1/reg1 <- add var2/reg2
|
|
|
|
var/reg <- add var2
|
|
|
|
add-to var1, var2/reg
|
|
|
|
var/reg <- add n
|
|
|
|
add-to var, n
|
|
|
|
|
|
|
|
var1/reg1 <- sub var2/reg2
|
|
|
|
var/reg <- sub var2
|
|
|
|
sub-from var1, var2/reg
|
|
|
|
var/reg <- sub n
|
|
|
|
sub-from var, n
|
|
|
|
|
|
|
|
var1/reg1 <- and var2/reg2
|
|
|
|
var/reg <- and var2
|
|
|
|
and-with var1, var2/reg
|
|
|
|
var/reg <- and n
|
|
|
|
and-with var, n
|
|
|
|
|
|
|
|
var1/reg1 <- or var2/reg2
|
|
|
|
var/reg <- or var2
|
|
|
|
or-with var1, var2/reg
|
|
|
|
var/reg <- or n
|
|
|
|
or-with var, n
|
|
|
|
|
|
|
|
var1/reg1 <- xor var2/reg2
|
|
|
|
var/reg <- xor var2
|
|
|
|
xor-with var1, var2/reg
|
|
|
|
var/reg <- xor n
|
|
|
|
xor-with var, n
|
|
|
|
|
2020-10-04 08:25:02 +00:00
|
|
|
var1/reg1 <- negate
|
|
|
|
negate var
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
var/reg <- copy var2/reg2
|
|
|
|
copy-to var1, var2/reg
|
|
|
|
var/reg <- copy var2
|
|
|
|
var/reg <- copy n
|
|
|
|
copy-to var, n
|
|
|
|
|
|
|
|
compare var1, var2/reg
|
|
|
|
compare var1/reg, var2
|
|
|
|
compare var/eax, n
|
|
|
|
compare var, n
|
|
|
|
|
2020-07-15 04:42:20 +00:00
|
|
|
var/reg <- shift-left n
|
|
|
|
var/reg <- shift-right n
|
|
|
|
var/reg <- shift-right-signed n
|
|
|
|
shift-left var, n
|
|
|
|
shift-right var, n
|
|
|
|
shift-right-signed var, n
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
var/reg <- multiply var2
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Any statement above that takes a variable in memory can be replaced with a
|
|
|
|
dereference (`*`) of an address variable (of type `(addr ...)`) in a register.
|
|
|
|
(Types can have multiple words, and are wrapped in `()` when they do.) But you
|
|
|
|
can't dereference variables in memory. You have to load them into a register
|
|
|
|
first.
|
|
|
|
|
|
|
|
Excluding dereferences, the above statements must operate on non-address
|
2020-10-14 17:45:25 +00:00
|
|
|
values with primitive types: `int`, `boolean` or `byte`. (Booleans are really
|
|
|
|
just `int`s, and Mu assumes any value but `0` is true.) You can copy addresses
|
|
|
|
to int variables, but not the other way around.
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-09-29 05:19:43 +00:00
|
|
|
## Floating-point primitives
|
|
|
|
|
|
|
|
These instructions may use the floating-point registers `xmm0` ... `xmm7`
|
|
|
|
(denoted by `/xreg2` or `/xrm32`). They also use integer values on occasion
|
|
|
|
(`/rm32` and `/r32`). They can't take literal floating-point values.
|
|
|
|
|
|
|
|
```
|
|
|
|
var/xreg <- add var2/xreg2
|
|
|
|
var/xreg <- add var2
|
|
|
|
var/xreg <- add *var2/reg2
|
|
|
|
|
|
|
|
var/xreg <- subtract var2/xreg2
|
|
|
|
var/xreg <- subtract var2
|
|
|
|
var/xreg <- subtract *var2/reg2
|
|
|
|
|
|
|
|
var/xreg <- multiply var2/xreg2
|
|
|
|
var/xreg <- multiply var2
|
|
|
|
var/xreg <- multiply *var2/reg2
|
|
|
|
|
|
|
|
var/xreg <- divide var2/xreg2
|
|
|
|
var/xreg <- divide var2
|
|
|
|
var/xreg <- divide *var2/reg2
|
|
|
|
|
|
|
|
var/xreg <- reciprocal var2/xreg2
|
|
|
|
var/xreg <- reciprocal var2
|
|
|
|
var/xreg <- reciprocal *var2/reg2
|
|
|
|
|
|
|
|
var/xreg <- square-root var2/xreg2
|
|
|
|
var/xreg <- square-root var2
|
|
|
|
var/xreg <- square-root *var2/reg2
|
|
|
|
|
|
|
|
var/xreg <- inverse-square-root var2/xreg2
|
|
|
|
var/xreg <- inverse-square-root var2
|
|
|
|
var/xreg <- inverse-square-root *var2/reg2
|
|
|
|
|
|
|
|
var/xreg <- min var2/xreg2
|
|
|
|
var/xreg <- min var2
|
|
|
|
var/xreg <- min *var2/reg2
|
|
|
|
|
|
|
|
var/xreg <- max var2/xreg2
|
|
|
|
var/xreg <- max var2
|
|
|
|
var/xreg <- max *var2/reg2
|
2020-10-04 05:53:05 +00:00
|
|
|
```
|
2020-09-29 05:19:43 +00:00
|
|
|
|
|
|
|
Remember, when these instructions use indirect mode, they still use an integer
|
|
|
|
register. Floating-point registers can't hold addresses.
|
|
|
|
|
2020-10-04 06:17:17 +00:00
|
|
|
Two instructions in the above list are approximate. According to the Intel
|
2020-10-14 17:45:25 +00:00
|
|
|
manual, `reciprocal` and `inverse-square-root` [go off the rails around the
|
2020-10-04 06:17:17 +00:00
|
|
|
fourth decimal place](x86_approx.md). If you need more precision, use `divide`
|
|
|
|
separately.
|
|
|
|
|
2020-09-29 05:19:43 +00:00
|
|
|
Most instructions operate exclusively on integer or floating-point operands.
|
|
|
|
The only exceptions are the instructions for converting between integers and
|
|
|
|
floating-point numbers.
|
|
|
|
|
2020-10-04 05:53:05 +00:00
|
|
|
```
|
2020-09-29 05:19:43 +00:00
|
|
|
var/xreg <- convert var2/reg2
|
|
|
|
var/xreg <- convert var2
|
|
|
|
var/xreg <- convert *var2/reg2
|
|
|
|
|
|
|
|
var/reg <- convert var2/xreg2
|
|
|
|
var/reg <- convert var2
|
|
|
|
var/reg <- convert *var2/reg2
|
2020-10-05 17:16:53 +00:00
|
|
|
|
|
|
|
var/reg <- truncate var2/xreg2
|
|
|
|
var/reg <- truncate var2
|
|
|
|
var/reg <- truncate *var2/reg2
|
2020-10-04 05:53:05 +00:00
|
|
|
```
|
2020-09-29 05:19:43 +00:00
|
|
|
|
|
|
|
There are no instructions accepting floating-point literals. To obtain integer
|
|
|
|
literals in floating-point registers, copy them to general-purpose registers
|
|
|
|
and then convert them to floating-point.
|
|
|
|
|
|
|
|
One pattern you may have noticed above is that the floating-point instructions
|
|
|
|
above always write to registers. The only exceptions are `copy` instructions,
|
|
|
|
which can write to memory locations.
|
|
|
|
|
2020-10-04 05:53:05 +00:00
|
|
|
```
|
2020-09-29 05:19:43 +00:00
|
|
|
var/xreg <- copy var2/xreg2
|
|
|
|
copy-to var1, var2/xreg
|
|
|
|
var/xreg <- copy var2
|
|
|
|
var/xreg <- copy *var2/reg2
|
2020-10-04 05:53:05 +00:00
|
|
|
```
|
2020-09-29 05:19:43 +00:00
|
|
|
|
|
|
|
Floating-point comparisons always put a register on the left-hand side:
|
|
|
|
|
2020-10-04 05:53:05 +00:00
|
|
|
```
|
2020-09-29 05:19:43 +00:00
|
|
|
compare var1/xreg1, var2/xreg2
|
|
|
|
compare var1/xreg1, var2
|
|
|
|
```
|
|
|
|
|
2020-07-05 22:28:37 +00:00
|
|
|
## Operating on individual bytes
|
|
|
|
|
2020-10-14 17:45:25 +00:00
|
|
|
A special-case is variables of type `byte`. Mu is a 32-bit platform so for the
|
2020-07-05 22:28:37 +00:00
|
|
|
most part only supports types that are multiples of 32 bits. However, we do
|
2020-10-14 17:45:25 +00:00
|
|
|
want to support strings in ASCII and UTF-8, which will be arrays of 8-bit
|
|
|
|
bytes.
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Since most x86 instructions implicitly load 32 bits at a time from memory,
|
|
|
|
variables of type 'byte' are only allowed in registers, not on the stack. Here
|
|
|
|
are the possible statements for reading bytes to/from memory:
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
var/reg <- copy-byte var2/reg2 # var: byte, var2: byte
|
|
|
|
var/reg <- copy-byte *var2/reg2 # var: byte, var2: (addr byte)
|
|
|
|
copy-byte-to *var1/reg1, var2/reg2 # var1: (addr byte), var2: byte
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
In addition, variables of type 'byte' are restricted to (the lowest bytes of)
|
2020-10-14 17:45:25 +00:00
|
|
|
just 4 registers: `eax`, `ecx`, `edx` and `ebx`. As always, this is due to
|
|
|
|
constraints of the x86 instruction set.
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
## Primitive jumps
|
|
|
|
|
|
|
|
There are two kinds of jumps, both with many variations: `break` and `loop`.
|
|
|
|
`break` instructions jump to the end of the containing block. `loop` instructions
|
|
|
|
jump to the beginning of the containing block.
|
|
|
|
|
|
|
|
All jumps can take an optional label starting with '$':
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
loop $foo
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
This instruction jumps to the beginning of the block called $foo. The corresponding
|
|
|
|
`break` jumps to the end of the block. Either jump statement must lie somewhere
|
|
|
|
inside such a block. Jumps are only legal to containing blocks. (Use named
|
|
|
|
blocks with restraint; jumps to places far away can get confusing.)
|
|
|
|
|
|
|
|
There are two unconditional jumps:
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
loop
|
|
|
|
loop label
|
|
|
|
break
|
|
|
|
break label
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
The remaining jump instructions are all conditional. Conditional jumps rely on
|
|
|
|
the result of the most recently executed `compare` instruction. (To keep
|
|
|
|
programs easy to read, keep compare instructions close to the jump that uses
|
|
|
|
them.)
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
break-if-=
|
|
|
|
break-if-= label
|
|
|
|
break-if-!=
|
|
|
|
break-if-!= label
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-10-14 17:45:25 +00:00
|
|
|
Inequalities are similar, but have additional variants for addresses and floats.
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
break-if-<
|
|
|
|
break-if-< label
|
|
|
|
break-if->
|
|
|
|
break-if-> label
|
|
|
|
break-if-<=
|
|
|
|
break-if-<= label
|
|
|
|
break-if->=
|
|
|
|
break-if->= label
|
|
|
|
|
|
|
|
break-if-addr<
|
|
|
|
break-if-addr< label
|
|
|
|
break-if-addr>
|
|
|
|
break-if-addr> label
|
|
|
|
break-if-addr<=
|
|
|
|
break-if-addr<= label
|
|
|
|
break-if-addr>=
|
|
|
|
break-if-addr>= label
|
2020-10-01 06:46:43 +00:00
|
|
|
|
|
|
|
break-if-float<
|
|
|
|
break-if-float< label
|
|
|
|
break-if-float>
|
|
|
|
break-if-float> label
|
|
|
|
break-if-float<=
|
|
|
|
break-if-float<= label
|
|
|
|
break-if-float>=
|
|
|
|
break-if-float>= label
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Similarly, conditional loops:
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
loop-if-=
|
|
|
|
loop-if-= label
|
|
|
|
loop-if-!=
|
|
|
|
loop-if-!= label
|
|
|
|
|
|
|
|
loop-if-<
|
|
|
|
loop-if-< label
|
|
|
|
loop-if->
|
|
|
|
loop-if-> label
|
|
|
|
loop-if-<=
|
|
|
|
loop-if-<= label
|
|
|
|
loop-if->=
|
|
|
|
loop-if->= label
|
|
|
|
|
|
|
|
loop-if-addr<
|
|
|
|
loop-if-addr< label
|
|
|
|
loop-if-addr>
|
|
|
|
loop-if-addr> label
|
|
|
|
loop-if-addr<=
|
|
|
|
loop-if-addr<= label
|
|
|
|
loop-if-addr>=
|
|
|
|
loop-if-addr>= label
|
2020-10-14 17:45:25 +00:00
|
|
|
|
|
|
|
loop-if-float<
|
|
|
|
loop-if-float< label
|
|
|
|
loop-if-float>
|
|
|
|
loop-if-float> label
|
|
|
|
loop-if-float<=
|
|
|
|
loop-if-float<= label
|
|
|
|
loop-if-float>=
|
|
|
|
loop-if-float>= label
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
## Addresses
|
|
|
|
|
|
|
|
Passing objects by reference requires the `address` operation, which returns
|
|
|
|
an object of type `addr`.
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
var/reg: (addr T) <- address var2: T
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Here `var2` can't live in a register.
|
|
|
|
|
|
|
|
## Array operations
|
|
|
|
|
2020-11-16 06:54:56 +00:00
|
|
|
Here's an example definition of a fixed-length array:
|
|
|
|
|
|
|
|
```
|
|
|
|
var x: (array int 3)
|
|
|
|
```
|
|
|
|
|
|
|
|
The length (here `3`) must be an integer literal. We'll show how to create
|
|
|
|
dynamically-sized arrays further down.
|
|
|
|
|
|
|
|
Arrays can be large; to avoid copying them around on every function call
|
|
|
|
you'll usually want to manage `addr`s to them. Here's an example computing the
|
|
|
|
address of an array.
|
|
|
|
|
|
|
|
```
|
|
|
|
var n/eax: (addr array int) <- address x
|
|
|
|
```
|
|
|
|
|
|
|
|
Addresses to arrays don't include the array length in their type. However, you
|
|
|
|
can obtain the length of an array like this:
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
var/reg: int <- length arr/reg: (addr array T)
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-11-16 06:54:56 +00:00
|
|
|
To operate on elements of an array, use the `index` statement:
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
var/reg: (addr T) <- index arr/reg: (addr array T), n
|
2020-11-16 06:54:56 +00:00
|
|
|
var/reg: (addr T) <- index arr: (array T len), n
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
The index can also be a variable in a register, with a caveat:
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: int
|
2020-11-16 06:54:56 +00:00
|
|
|
var/reg: (addr T) <- index arr: (array T len), idx/reg: int
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
The caveat: the size of T must be 1, 2, 4 or 8 bytes. The x86 instruction set
|
|
|
|
has complex addressing modes that can index into an array in a single instruction
|
|
|
|
in these situations.
|
|
|
|
|
2020-11-16 06:54:56 +00:00
|
|
|
For other sizes of T you'll need to split up the work, performing a `compute-offset`
|
2020-07-05 22:28:37 +00:00
|
|
|
before the `index`.
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
var/reg: (offset T) <- compute-offset arr: (addr array T), idx/reg: int # arr can be in reg or mem
|
|
|
|
var/reg: (offset T) <- compute-offset arr: (addr array T), idx: int # arr can be in reg or mem
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
The `compute-offset` statement returns a value of type `(offset T)` after
|
|
|
|
performing any necessary bounds checking. Now the offset can be passed to
|
|
|
|
`index` as usual:
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: (offset T)
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-11-16 06:54:56 +00:00
|
|
|
## Stream operations
|
|
|
|
|
|
|
|
A common use for arrays is as buffers. Save a few items to a scratch space and
|
|
|
|
then process them. This pattern is so common (we use it in files) that there's
|
|
|
|
special support for it with a built-in type: `stream`.
|
|
|
|
|
|
|
|
Streams are like arrays in many ways. You can initialize them with a length:
|
|
|
|
|
|
|
|
```
|
|
|
|
var x: (stream int 3)
|
|
|
|
```
|
|
|
|
|
|
|
|
However, streams don't provide random access with an `index` instruction.
|
|
|
|
Instead, you write to them sequentially, and read back what you wrote.
|
|
|
|
|
|
|
|
```
|
|
|
|
read-from-stream s: (addr stream T), out: (addr T)
|
|
|
|
write-to-stream s: (addr stream T), in: (addr T)
|
|
|
|
var/eax: boolean <- stream-empty? s: (addr stream)
|
|
|
|
var/eax: boolean <- stream-full? s: (addr stream)
|
|
|
|
```
|
|
|
|
|
|
|
|
You can clear streams:
|
|
|
|
|
|
|
|
```
|
|
|
|
clear-stream f: (addr stream _)
|
|
|
|
```
|
|
|
|
|
|
|
|
You can also rewind them to reread what's been written:
|
|
|
|
|
|
|
|
```
|
|
|
|
rewind-stream f: (addr stream _)
|
|
|
|
```
|
|
|
|
|
2020-07-05 22:28:37 +00:00
|
|
|
## Compound types
|
|
|
|
|
|
|
|
Primitive types can be combined together using the `type` keyword. For
|
|
|
|
example:
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
type point {
|
|
|
|
x: int
|
|
|
|
y: int
|
|
|
|
}
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Mu programs are currently sequences of `fn` and `type` definitions.
|
|
|
|
|
2020-10-14 17:45:25 +00:00
|
|
|
Compound types can't include `addr` types for safety (use `handle` instead,
|
|
|
|
which is described below). They also can't currently include `array`, `stream`
|
|
|
|
or `byte` types. Since arrays and streams carry their size with them, supporting
|
|
|
|
them in compound types complicates variable initialization. Instead of
|
|
|
|
defining them inline in a type definition, define a `handle` to them. Bytes
|
|
|
|
shouldn't be used for anything but utf-8 strings.
|
2020-09-29 10:41:05 +00:00
|
|
|
|
2020-07-05 22:28:37 +00:00
|
|
|
To access within a compound type, use the `get` instruction. There are two
|
|
|
|
forms. You need either a variable of the type itself (say `T`) in memory, or a
|
|
|
|
variable of type `(addr T)` in a register.
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
var/reg: (addr T_f) <- get var/reg: (addr T), f
|
|
|
|
var/reg: (addr T_f) <- get var: T, f
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-10-14 17:45:25 +00:00
|
|
|
The `f` here is the field name from the `type` definition, and its type `T_f`
|
|
|
|
must match the type of `f` in the `type` definition. For example, some legal
|
2020-07-05 22:28:37 +00:00
|
|
|
instructions for the definition of `point` above:
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
var a/eax: (addr int) <- get p, x
|
|
|
|
var a/eax: (addr int) <- get p, y
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
## Handles for safe access to the heap
|
|
|
|
|
2020-10-14 17:45:25 +00:00
|
|
|
We've seen the `addr` type, but it's intended to be short-lived. `addr` values
|
|
|
|
should never escape from functions. In particular, save `addr` values inside
|
|
|
|
compound `type`s. To do that you need a "fat pointer" called a `handle` that
|
|
|
|
is safe to keep around for extended periods and ensures it's used safely
|
|
|
|
without corrupting the heap and causing security issues or hard-to-debug
|
|
|
|
misbehavior.
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
To actually _use_ a `handle`, we have to turn it into an `addr` first using
|
|
|
|
the `lookup` statement.
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
|
|
|
var y/reg: (addr T) <- lookup x
|
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Now operate on the `addr` as usual, safe in the knowledge that you can later
|
|
|
|
recover any writes to its payload from `x`.
|
|
|
|
|
|
|
|
It's illegal to continue to use this `addr` after a function that reclaims
|
|
|
|
heap memory. You have to repeat the lookup from the `handle`. (Luckily Mu
|
|
|
|
doesn't implement reclamation yet.)
|
|
|
|
|
|
|
|
Having two kinds of addresses takes some getting used to. Do we pass in
|
|
|
|
variables by value, by `addr` or by `handle`? In inputs or outputs? Here are 3
|
2020-10-14 17:45:25 +00:00
|
|
|
rules of thumb:
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
* Functions that need to look at the payload should accept an `(addr ...)`.
|
|
|
|
* Functions that need to treat a handle as a value, without looking at its
|
|
|
|
payload, should accept a `(handle ...)`. Helpers that save handles into data
|
|
|
|
structures are a common example.
|
|
|
|
* Functions that need to allocate memory should accept an `(addr handle
|
|
|
|
...)`.
|
|
|
|
|
|
|
|
Try to avoid mixing these use cases.
|
|
|
|
|
2020-07-17 05:34:57 +00:00
|
|
|
If you have a variable `src` of type `(handle ...)`, you can save it inside a
|
|
|
|
compound type like this (provided the types match):
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
2020-07-17 05:34:57 +00:00
|
|
|
var dest/reg: (addr handle T_f) <- get var: (addr T), f
|
|
|
|
copy-handle src, dest
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
|
|
|
Or this:
|
|
|
|
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
2020-07-17 05:34:57 +00:00
|
|
|
var dest/reg: (addr handle T) <- index arr: (addr array handle T), n
|
|
|
|
copy-handle src, dest
|
2020-07-12 23:37:58 +00:00
|
|
|
```
|
2020-07-05 22:28:37 +00:00
|
|
|
|
2020-07-14 05:21:26 +00:00
|
|
|
To create handles to non-array types, use `allocate`:
|
|
|
|
|
|
|
|
```
|
|
|
|
var x: (addr handle T)
|
|
|
|
... initialize x ...
|
|
|
|
allocate x
|
|
|
|
```
|
|
|
|
|
|
|
|
To create handles to array types, use `populate`:
|
|
|
|
|
|
|
|
```
|
|
|
|
var x: (addr handle array T)
|
|
|
|
... initialize x ...
|
|
|
|
populate x, 3 # array of 3 T's
|
|
|
|
```
|
|
|
|
|
2020-07-13 04:34:04 +00:00
|
|
|
You can copy handles to another variable on the stack like this:
|
|
|
|
|
|
|
|
```
|
|
|
|
var x: (handle T)
|
|
|
|
# ..some code initializing x..
|
|
|
|
var y/eax: (addr handle T) <- address ...
|
|
|
|
copy-handle x, y
|
|
|
|
```
|
|
|
|
|
2020-10-14 17:45:25 +00:00
|
|
|
## Seams
|
|
|
|
|
|
|
|
I said at the start that most instructions map 1:1 to x86 machine code. To
|
|
|
|
enforce type- and memory-safety, I was forced to carve out a few exceptions:
|
|
|
|
|
2020-10-14 17:52:03 +00:00
|
|
|
* the `index` instruction on arrays, for bounds-checking (not yet implemented)
|
2020-10-14 17:45:25 +00:00
|
|
|
* the `length` instruction on arrays, for translating the array size in bytes
|
|
|
|
into the number of elements.
|
|
|
|
* the `lookup` instruction on handles, for validating fat-pointer metadata
|
|
|
|
* `var` instructions, for initializing memory
|
|
|
|
|
2020-10-14 17:52:03 +00:00
|
|
|
If you're curious, [the compiler summary page](http://akkartik.github.io/mu/html/mu_instructions.html)
|
|
|
|
has the complete nitty-gritty on how each instruction is implemented. Including
|
|
|
|
the above exceptions.
|
|
|
|
|
2020-07-05 22:28:37 +00:00
|
|
|
## Conclusion
|
|
|
|
|
2020-10-14 17:45:25 +00:00
|
|
|
Anything not allowed here is forbidden. Even if the compiler doesn't currently
|
|
|
|
detect and complain about it. Please [contact me](mailto:ak@akkartik.com) or
|
|
|
|
[report issues](https://github.com/akkartik/mu/issues) when you encounter a
|
|
|
|
missing or misleading error message. Thank you for bearing with the dust! I'm
|
|
|
|
here for the long haul, and everything will be clean and checked in due time.
|