This commit reimplements commit 6515 to happen during type-checking rather
than as early as possible. That way we naturally get a more informative
error message.
The new failing test is now passing, and so is this manual test that had
been throwing a spurious error:
fn foo {
var a/eax: int <- copy 0
var b/ebx: int <- copy 0
{
var a1/eax: int <- copy 0
var b1/ebx: int <- copy a1
}
b <- copy a
}
However, factorial.mu is still throwing a spurious error.
Some history on this commit's fix: When I moved stack-location tracking
out of the parsing phase (commit 6116, Mar 10) I thoughtlessly moved block-depth
tracking as well. And the reason that happened: I'd somehow gotten by without
ever cleaning up vars from a block during parsing. For all my tests, this
is a troubling sign that I'm not testing enough.
The good news: clean-up-blocks works perfectly during parsing.
Before: bytes can't live on the stack, so size(byte) == 1 just for array
elements.
After: bytes mostly can't live on the stack except for function args (which
seem too useful to disallow), so size(byte) == 4 except there's now a new
primitive called element-size for array elements where size(byte) == 1.
Now apps/browse.subx starts working again.
Several bugs fixed in the process, and expectation of further bugs is growing.
I'd somehow started assuming I don't need to have separate cases for rm32
as a register vs mem. That's not right. We might need more reg-reg Primitives.
Most unbelievably, I'd forgotten to pass the output 'out' arg to 'lookup-var'
long before the recent additions of 'err' and 'ed' args. But things continued
to work because an earlier call just happened to leave the arg at just
the right place on the stack. So we only caught all these places when we
had to provide error messages.
Byte-oriented addressing is only supported in a couple of instructions
in SubX. As a result, variables of type 'byte' can't live on the stack,
or in registers 'esi' and 'edi'.
I had a little "optimization" to avoid creating nested blocks if "they weren't
needed". Except, of course, they were. Lose the optimization. Sometimes
we create multiple jumps when a single one would suffice. Ignore that for
now.
The rule: emit spills for a register unless the output is written somewhere
in the current block after the current instruction. Including in nested
blocks.
Let's see if this is right.
Rather than have two ways to decide whether to emit push/pop instructions,
just record for each var on the 'vars' stack whether we emitted a push
for it, and reuse the decision to emit a pop.
Observations:
- the orchestration from 'in' to 'addr-in' to '_in-addr' to 'in-addr'
is quite painful. Once to turn a handle into its address, once to turn
a handle into the address of its payload, and a third time to switch
a variable out of the overloaded 'eax' variable to make room for read-byte-buffered.
- I'm starting to use SubX as an escape hatch for features missing in Mu:
- access to syscalls (which pass args in registers)
- access to global variables
How did new-literal ever work?! Somehow we had eax silently being clobbered
without affecting behavior over like 5 apps. Unsafe languages suck.
Anyways, factorial.mu is now part of CI.
So far it's unclear how to do this in a series of small commits. Still
nibbling around the edges. In this commit we standardize some terminology:
The length of an array or stream is denominated in the high-level elements.
The _size_ is denominated in bytes.
The thing we encode into the type is always the size, not the length.
There's still an open question of what to do about the Mu `length` operator.
I'd like to modify it to provide the length. Currently it provides the
size. If I can't fix that I'll rename it.
At the lowest level, SubX without syntax sugar uses names without prepositions.
For example, 01 and 03 are both called 'add', irrespective of source and
destination operand. Horizontal space is at a premium, and we rely on the
comments at the end of each line to fully describe what is happening.
Above that, however, we standardize on a slightly different naming convention
across:
a) SubX with syntax sugar,
b) Mu, and
c) the SubX code that the Mu compiler emits.
Conventions, in brief:
- by default, the source is on the left and destination on the right.
e.g. add %eax, 1/r32/ecx ("add eax to ecx")
- prepositions reverse the direction.
e.g. add-to %eax, 1/r32/ecx ("add ecx to eax")
subtract-from %eax, 1/r32/ecx ("subtract ecx from eax")
- by default, comparisons are left to right while 'compare<-' reverses.
Before, I was sometimes swapping args to make the operation more obvious,
but that would complicate the code-generation of the Mu compiler, and it's
nice to be able to read the output of the compiler just like hand-written
code.
One place where SubX differs from Mu: copy opcodes are called '<-' and
'->'. Hopefully that fits with the spirit of Mu rather than the letter
of the 'copy' and 'copy-to' instructions.
At the SubX level we have to put up with null-terminated kernel strings
for commandline args. But so far we haven't done much with them. Rather
than try to support them we'll just convert them transparently to standard
length-prefixed strings.
In the process I realized that it's not quite right to treat the combination
of argc and argv as an array of kernel strings. Argc counts the number
of elements, whereas the length of an array is usually denominated in bytes.
I built this in 3 phases:
a) create a helper in the bootstrap VM to render the state of the stack.
b) interactively arrive at the right function (tools/stack_array.subx)
c) pull the final solution into the standard library (093stack_allocate.subx)
As the final layer says, this may not be the fastest approach for most
(or indeed any) Mu programs. Perhaps it's better on balance for the compiler
to just emit n/4 `push` instructions.
(I'm sure this solution can be optimized further.)
We can't do it during parsing time because we may not have all type definitions
available yet. Mu supports using types before defining them.
At first I thought I should do it in populate-mu-type-sizes (appropriately
renamed). But there's enough complexity to tracking when stuff lands on
the stack that it's easiest to do while emitting code.
I don't think we need this information earlier in the compiler. If I'm
right, it seems simpler to colocate the computation of state close to where
it's used.
Move out total-size computation from parsing to a separate phase.
I don't have any new tests yet, but it's encouraging that existing tests
continue to pass.
This may be the first time I've ever written this much machine code (with
mutual recursion!) and gotten it to work the first time.
Make room for additional information for each field in a record/product
type.
Fields can be used before they're declared, and we may not know the offsets
they correspond to at that point. This is going to necessitate a lot of
restructuring.
If indexing into a type with power-of-2-sized elements we can access them
in one instruction:
x/reg1: (addr int) <- index A/reg2: (addr array int), idx/reg3: int
This translates to a single instruction because x86 instructions support
an addressing mode with left-shifts.
For non-powers-of-2, however, we need a multiply. To keep things type-safe,
it is performed like this:
x/reg1: (offset T) <- compute-offset A: (addr array T), idx: int
y/reg2: (addr T) <- index A, x
An offset is just an int that is guaranteed to be a multiple of size-of(T).
Offsets can only be used in index instructions, and the types will eventually
be required to line up.
In the process, I have to expand Input-size because mu.subx is growing
big.
In the process I'm starting to realize that my approach to avoiding spills
isn't ideal. It works for local variables but not to avoid spilling outputs.
To correctly decide whether to spill to an output register or not, we really
need to analyze when a variable is live. If we don't do that, we'll end
up in one of two bad situations:
a) Don't spill the outermost use of an output register (or just the outermost
scope in a function). This is weird because it's hard to explain to the
programmer why they can overwrite a local with an output above a '{' but
not below.
b) Disallow overwriting entirely. This is easier to communicate but quite
inconvenient. It's nice to be able to use eax for some temporary purpose
before overwriting it with the final result of a function.
If we instead track liveness, things are convenient and also easier to
explain. If a temporary is used after the output has been written that's
an obvious problem: "you clobbered the output". (It seems more reasonable
to disallow multiple live ranges for the output. Once an output is written
it can only be shadowed in a nested block.)
That's the bad news. Now for some good news:
One lovely property Mu the language has at the moment is that live ranges
are guaranteed to be linear segments of code. We don't need to analyze
loop-carried dependences. This means that we can decide whether a variable
is live purely by scanning later statements for its use. (Defining 'register
use' is slightly non-trivial; primitives must somehow specify when they
read their output register.)
So we don't actually need to worry about a loop reading a register with
one type and writing to another type at the end of an iteration. The only
way that can happen is if the write at the end was to a local variable,
and we're guaranteeing that local variables will be reclaimed at the end
of the iteration.
So, the sequence of tasks:
a) compute register liveness
b1) verify that all register variables used at any point in a program
are always the topmost use of that register.
b2) decide whether to spill/shadow, clobber or flag an error.
There's still the open question of where to attach liveness state. It can't
be on a var, because liveness varies by use of the var. It can't be on a
statement because we may want to know the liveness of variables not referenced
in a given statement. Conceptually we want a matrix of locals x stmts (flattened).
But I think it's simpler than that. We just want to know liveness at the
time of variable declarations. A new register variable can be in one of
three states w.r.t. its previous definition: either it's shadowing it,
or it can clobber it, or there's a conflict and we need to raise an error.
I think we can compute this information for each variable definition by
an analysis similar to existing ones, maintaining a stack of variable definitions.
The major difference is that we don't pop variables when a block ends.
Details to be worked out. But when we do I hope to get these pending tests
passing.
This is a lot of code for a single test, and it took a long time to get
my data model just right. But the test coverage seems ok because it feels
mostly like straight-line code. We'll see.
I've also had to add a lot of prints. We really need app-level trace generation
pretty urgently. That requires deciding how to turn it on/off from the
commandline. And I've been reluctant to start relying on the hairy interface
that is POSIX open().
And we're using it now in factorial.mu!
In the process I had to fix a couple of bugs in pointer dereferencing.
There are still some limitations:
a) Indexing by a literal doesn't work yet.
b) Only arrays of ints supported so far.
Looking ahead, I'm not sure how I can support indexing arrays by non-literals
(variables in registers) unless the element size is a power of 2.
I'd been thinking I didn't need unconditional `break` instructions, but
I just realized that non-local unconditional breaks have a use. Stop over-thinking
this, just support everything.
The code is quite duplicated.
We'll be doing type-checking in a separate phase in future. For now we
need only to distinguish between literals and non-literals for x86 primitive
instructions.
I was tempted to support x86 set__ instructions for this change:
https://c9x.me/x86/html/file_module_x86_id_288.html
That will happen at some point. And I'll simplify a bunch of branches for
results of predicate functions when it happens.
This cleans up a bunch of little warts that had historically accumulated
because of my bull-headedness in not designing a grammar up front. Let's
see if the lack of a grammar comes up again.
We now require that there be no space in variable declarations between
the name and the colon separating it from its type.
Allow comments at the end of all kinds of statements.
To do this I replaced all calls to next-word with next-mu-token.. except
one. I'm not seeing any bugs yet, any places where comments break things.
But this exception makes me nervous.
Support calling SubX code from Mu. I have _zero_ idea how to make this
safe.
Now we can start writing tests. We can't use commandline args yet. That
requires support for kernel strings.
I've been saying that we can convert this:
{
var x: int
break-if-=
...
}
..into this:
{
68/push 0/imm32
{
0f 84/jump-if-= break/disp32
...
}
81 0/subop/add %esp 4/imm32
}
All subsequent instructions go into a nested block, so that they can be
easily skipped without skipping the stack cleanup.
However, I've been growing aware that this is a special case. Most of the
time we can't use this trick:
for loops
for non-local breaks
for non-local loops
In most cases we need to figure out all the intervening variables on the
stack and emit code to clean them up.
And now it turns out even for local breaks like above, the trick doesn't
work. Consider what happens when there's a loop later in the block:
{
var x: int
break-if-=
...
}
If we emitted a nested block for the break, the local loop would become
non-local. So we replace one kind of state with another.
Easiest course of action is to just emit the exact same cleanup code for
all conditional branches.
Turns out we can't handle them like conditional loops.
This function to emit cleanup code for jumps is getting quite terrible.
I don't yet know what subsidiary abstractions it needs.
Start pushing dummy vars for labels on the stack as we encounter them.
This won't affect cleanup code, but will make it easy to ensure that jumps
are well-structured.
Clean up data structures and eliminate the notion of named blocks.
Named blocks still exist in the Mu language. But they get parsed into a
uniform block data structure, same as unamed blocks.
This will come in handy for the remaining cases where we need to clean
up locals on the stack:
loop after var
non-local break with vars in intervening blocks
non-local loop with vars in intervening blocks
Before:
we detected labels using a '$' at the start of an arg, and turned them
into literals.
After:
we put labels on the var stack and let the regular lookup of the var
stack handle labels.
This adds complexity in one place and removes it from another. The crucial
benefit is that it allows us to store a block depth for each label. That
will come in handy later.
All this works only because of a salubrious coincidence: Mu labels are
always at the start of a block, and jumps always refer to the name at the
start of a block, even when the jump is in the forwards direction. So we
never see label uses before definitions.
Note on CI: this currently only works natively, not emulated.
So far we only handle unlabeled break instructions correctly. That part
is elegance itself. But the rest will need more work:
a) For labeled breaks we need to insert code to unwind all intervening
blocks.
b) For unlabeled loops we need to insert code to unwind the current block
and then loop.
c) For labeled loops we need to insert code to unwind all intervening blocks
and then loop.
Is this even worth doing? I think so. It's pretty common for a conditional
block inside a loop to 'continue'. That requires looping to somewhere non-local.