7251

2020-11-16 00:30:58 -08:00 · 2020-11-16 00:30:58 -08:00 · 3d31467c0d
parent 8d2dece291
commit 3d31467c0d
4 changed files with 53 additions and 50 deletions
--- a/apps/mu
+++ b/apps/mu
--- a/apps/mu.subx
+++ b/apps/mu.subx
@ -23296,7 +23296,7 @@ $check-mu-populate-stmt:get-length:
    89/<- %esi 0/r32/eax
    # 1 inout
    3d/compare-eax-and 0/imm32
-    0f 84/jump-if-= $check-mu-copy-object-stmt:error-incorrect-inouts/disp32
+    0f 84/jump-if-= $check-mu-populate-stmt:error-incorrect-inouts/disp32
    # > 2 inouts
    (lookup *(esi+8) *(esi+0xc))  # Stmt-var-next Stmt-var-next => eax
    3d/compare-eax-and 0/imm32
@ -23446,7 +23446,7 @@ $check-mu-populate-stream-stmt:get-length:
    89/<- %esi 0/r32/eax
    # 1 inout
    3d/compare-eax-and 0/imm32
-    0f 84/jump-if-= $check-mu-copy-object-stmt:error-incorrect-inouts/disp32
+    0f 84/jump-if-= $check-mu-populate-stream-stmt:error-incorrect-inouts/disp32
    # > 2 inouts
    (lookup *(esi+8) *(esi+0xc))  # Stmt-var-next Stmt-var-next => eax
    3d/compare-eax-and 0/imm32
--- a/html/apps/mu.subx.html
+++ b/html/apps/mu.subx.html
@ -21906,7 +21906,7 @@ if ('onhashchange' in window) {
 <span id="L23296" class="LineNr">23296 </span>    89/&lt;- %esi 0/r32/eax
 <span id="L23297" class="LineNr">23297 </span>    <span class="subxComment"># 1 inout</span>
 <span id="L23298" class="LineNr">23298 </span>    3d/compare-eax-and 0/imm32
-<span id="L23299" class="LineNr">23299 </span>    0f 84/jump-if-= $check-mu-copy-object-stmt:error-incorrect-inouts/disp32
+<span id="L23299" class="LineNr">23299 </span>    0f 84/jump-if-= $check-mu-populate-stmt:error-incorrect-inouts/disp32
 <span id="L23300" class="LineNr">23300 </span>    <span class="subxComment"># &gt; 2 inouts</span>
 <span id="L23301" class="LineNr">23301 </span>    (<a href='../120allocate.subx.html#L256'>lookup</a> *(esi+8) *(esi+0xc))  <span class="subxComment"># Stmt-var-next Stmt-var-next =&gt; eax</span>
 <span id="L23302" class="LineNr">23302 </span>    3d/compare-eax-and 0/imm32
@ -22056,7 +22056,7 @@ if ('onhashchange' in window) {
 <span id="L23446" class="LineNr">23446 </span>    89/&lt;- %esi 0/r32/eax
 <span id="L23447" class="LineNr">23447 </span>    <span class="subxComment"># 1 inout</span>
 <span id="L23448" class="LineNr">23448 </span>    3d/compare-eax-and 0/imm32
-<span id="L23449" class="LineNr">23449 </span>    0f 84/jump-if-= $check-mu-copy-object-stmt:error-incorrect-inouts/disp32
+<span id="L23449" class="LineNr">23449 </span>    0f 84/jump-if-= $check-mu-populate-stream-stmt:error-incorrect-inouts/disp32
 <span id="L23450" class="LineNr">23450 </span>    <span class="subxComment"># &gt; 2 inouts</span>
 <span id="L23451" class="LineNr">23451 </span>    (<a href='../120allocate.subx.html#L256'>lookup</a> *(esi+8) *(esi+0xc))  <span class="subxComment"># Stmt-var-next Stmt-var-next =&gt; eax</span>
 <span id="L23452" class="LineNr">23452 </span>    3d/compare-eax-and 0/imm32
--- a/mu.md
+++ b/mu.md
@ -45,7 +45,7 @@ and [vocabulary.md](vocabulary.md).

 Zooming out from single statements, here's a complete sample program in Mu:

-<img alt='ex2.mu' src='html/ex2.mu.png'>
+<img alt='ex2.mu' src='html/ex2.mu.png' width='400px'>

 Mu programs are lists of functions. Each function has the following form:

@ -59,9 +59,7 @@ fn _name_ _inout_ ... -> _output_ ... {

 Each function has a header line, and some number of statements, each on a
 separate line. Headers describe inouts and outputs. Inouts can't be registers,
-and outputs _must_ be registers. Outputs can't take names. In the above
-example, the outputs of both `do-add` and `main` have type `int` and are
-available in register `ebx` at the end of the respective calls.
+and outputs _must_ be registers. Outputs can't take names.

 The above program also demonstrates a function call (to the function `do-add`).
 Function calls look the same as primitive statements: they can return (multiple)
@ -92,7 +90,7 @@ two signatures:
 - `fn main -> _/ebx: int`
 - `fn main args: (addr array (addr array byte)) -> _/ebx: int`

-(The names of the inout and output are flexible.)
+(The name of the inout is flexible.)

 Mu encloses multi-word types in parentheses, and types can get quite expressive.
 For example, you read `main`'s inout type as "an address to an array of
@ -103,7 +101,7 @@ always strings in Mu, you'll quickly learn to mentally shorten this type to
 ## Blocks

 Blocks are useful for grouping related statements. They're delimited by `{`
-and `}`, both each alone on a line.
+and `}`, each alone on a line.

 Blocks can nest:

@ -225,9 +223,8 @@ var/reg <- multiply var2

 Any statement above that takes a variable in memory can be replaced with a
 dereference (`*`) of an address variable (of type `(addr ...)`) in a register.
-(Types can have multiple words, and are wrapped in `()` when they do.) But you
-can't dereference variables in memory. You have to load them into a register
-first.
+You can't dereference variables in memory. You have to load them into a
+register first.

 Excluding dereferences, the above statements must operate on non-address
 values with primitive types: `int`, `boolean` or `byte`. (Booleans are really
@ -238,7 +235,7 @@ to int variables, but not the other way around.

 These instructions may use the floating-point registers `xmm0` ... `xmm7`
 (denoted by `/xreg2` or `/xrm32`). They also use integer values on occasion
-(`/rm32` and `/r32`). They can't take literal floating-point values.
+(`/rm32` and `/r32`).

 ```
 var/xreg <- add var2/xreg2
@ -308,9 +305,8 @@ There are no instructions accepting floating-point literals. To obtain integer
 literals in floating-point registers, copy them to general-purpose registers
 and then convert them to floating-point.

-One pattern you may have noticed above is that the floating-point instructions
-above always write to registers. The only exceptions are `copy` instructions,
-which can write to memory locations.
+The floating-point instructions above always write to registers. The only
+instructions that can write floats to memory are `copy` instructions.

 ```
 var/xreg <- copy var2/xreg2
@ -319,7 +315,8 @@ var/xreg <- copy var2
 var/xreg <- copy *var2/reg2
 ```

-Floating-point comparisons always put a register on the left-hand side:
+Finally, there are floating-point comparisons. They must always put a register
+on the left-hand side:

 ```
 compare var1/xreg1, var2/xreg2
@ -328,7 +325,7 @@ compare var1/xreg1, var2

 ## Operating on individual bytes

-A special-case is variables of type `byte`. Mu is a 32-bit platform so for the
+A special case is variables of type `byte`. Mu is a 32-bit platform so for the
 most part only supports types that are multiples of 32 bits. However, we do
 want to support strings in ASCII and UTF-8, which will be arrays of 8-bit
 bytes.
@ -375,7 +372,7 @@ break label

 The remaining jump instructions are all conditional. Conditional jumps rely on
 the result of the most recently executed `compare` instruction. (To keep
-programs easy to read, keep compare instructions close to the jump that uses
+programs easy to read, keep `compare` instructions close to the jump that uses
 them.)

 ```
@ -571,7 +568,7 @@ type point {

 Mu programs are currently sequences of `fn` and `type` definitions.

-Compound types can't include `addr` types for safety (use `handle` instead,
+Compound types can't include `addr` types for safety reasons (use `handle` instead,
 which is described below). They also can't currently include `array`, `stream`
 or `byte` types. Since arrays and streams carry their size with them, supporting
 them in compound types complicates variable initialization. Instead of
@ -596,39 +593,55 @@ var a/eax: (addr int) <- get p, x
 var a/eax: (addr int) <- get p, y
 ```

+You can clear arbitrary types using the `clear-object` function:
+
+```
+clear-object var: (addr T)
+```
+
+Don't clear arrays or streams using `clear-object`; doing so will irreversibly
+make their length 0 as well.
+
+You can shallow-copy arbitrary types using the `copy-object` function:
+
+```
+copy-object src: (addr T), dest: (addr T)
+```
+
 ## Handles for safe access to the heap

 We've seen the `addr` type, but it's intended to be short-lived. `addr` values
-should never escape from functions. In particular, save `addr` values inside
-compound `type`s. To do that you need a "fat pointer" called a `handle` that
-is safe to keep around for extended periods and ensures it's used safely
-without corrupting the heap and causing security issues or hard-to-debug
-misbehavior.
+should never escape from functions. Function outputs can't be `addr`s,
+function inouts can't include `addr` in their payload type. Finally, you can't
+save `addr` values inside compound `type`s. To do that you need a "fat
+pointer" called a `handle` that is safe to keep around for extended periods
+and ensures it's used safely without corrupting the heap and causing security
+issues or hard-to-debug misbehavior.

 To actually _use_ a `handle`, we have to turn it into an `addr` first using
 the `lookup` statement.

 ```
-var y/reg: (addr T) <- lookup x
+var y/reg: (addr T) <- lookup x: (handle T)
 ```

-Now operate on the `addr` as usual, safe in the knowledge that you can later
-recover any writes to its payload from `x`.
+Now operate on `y` as usual, safe in the knowledge that you can later recover
+any writes to its payload from `x`.

-It's illegal to continue to use this `addr` after a function that reclaims
-heap memory. You have to repeat the lookup from the `handle`. (Luckily Mu
-doesn't implement reclamation yet.)
+It's illegal to continue to use an `addr` after a function that reclaims heap
+memory. You have to repeat the lookup from the `handle`. (Luckily Mu doesn't
+implement reclamation yet.)

 Having two kinds of addresses takes some getting used to. Do we pass in
 variables by value, by `addr` or by `handle`? In inputs or outputs? Here are 3
 rules of thumb:

-  * Functions that need to look at the payload should accept an `(addr ...)`.
+  * Functions that need to look at the payload should accept an `(addr ...)`
+    where possible.
  * Functions that need to treat a handle as a value, without looking at its
-  payload, should accept a `(handle ...)`. Helpers that save handles into data
-  structures are a common example.
-  * Functions that need to allocate memory should accept an `(addr handle
-  ...)`.
+    payload, should accept a `(handle ...)`. Helpers that save handles into
+    data structures are a common example.
+  * Functions that need to allocate memory should accept an `(addr handle ...)`.

 Try to avoid mixing these use cases.

@ -655,7 +668,7 @@ var x: (addr handle T)
 allocate x
 ```

-To create handles to array types, use `populate`:
+To create handles to array types (of potentially dynamic size), use `populate`:

 ```
 var x: (addr handle array T)
@ -663,15 +676,6 @@ var x: (addr handle array T)
 populate x, 3  # array of 3 T's
 ```

-You can copy handles to another variable on the stack like this:
-
-```
-var x: (handle T)
-# ..some code initializing x..
-var y/eax: (addr handle T) <- address ...
-copy-handle x, y
-```
-
 ## Seams

 I said at the start that most instructions map 1:1 to x86 machine code. To
@ -689,8 +693,7 @@ the above exceptions.

 ## Conclusion

-Anything not allowed here is forbidden. Even if the compiler doesn't currently
+Anything not allowed here is forbidden, even if the compiler doesn't currently
 detect and complain about it. Please [contact me](mailto:ak@akkartik.com) or
 [report issues](https://github.com/akkartik/mu/issues) when you encounter a
-missing or misleading error message. Thank you for bearing with the dust! I'm
-here for the long haul, and everything will be clean and checked in due time.
+missing or misleading error message.