6640

Re-sync markdown files with mu-normie fork.
2020-07-12 16:37:58 -07:00 · 2020-07-12 16:37:58 -07:00 · 7817fdb29c
parent ac8b37f9a8
commit 7817fdb29c
5 changed files with 265 additions and 259 deletions
--- a/README.md
+++ b/README.md
@ -7,14 +7,14 @@ Mu is not designed to operate in large clusters providing services for
 millions of people. Mu is designed for _you_, to run one computer. (Or a few.)
 Running the code you want to run, and nothing else.

-  ```sh
-  $ git clone https://github.com/akkartik/mu
-  $ cd mu
-  $ ./translate_mu apps/ex2.mu  # emit a.elf
-  $ ./a.elf  # adds 3 and 4
-  $ echo $?
-  7
-  ```
+```sh
+$ git clone https://github.com/akkartik/mu
+$ cd mu
+$ ./translate_mu apps/ex2.mu  # emit a.elf
+$ ./a.elf  # adds 3 and 4
+$ echo $?
+7
+```

 [![Build Status](https://api.travis-ci.org/akkartik/mu.svg?branch=master)](https://travis-ci.org/akkartik/mu)

@ -82,26 +82,26 @@ result in good error messages.
 Once generated, ELF binaries can be packaged up with a Linux kernel into a
 bootable disk image:

-  ```sh
-  $ ./translate_mu apps/ex2.mu  # emit a.elf
-  # dependencies
-  $ sudo apt install build-essential flex bison wget libelf-dev libssl-dev xorriso
-  $ tools/iso/linux a.elf
-  $ qemu-system-x86_64 -m 256M -cdrom mu_linux.iso -boot d
-  ```
+```sh
+$ ./translate_mu apps/ex2.mu  # emit a.elf
+# dependencies
+$ sudo apt install build-essential flex bison wget libelf-dev libssl-dev xorriso
+$ tools/iso/linux a.elf
+$ qemu-system-x86_64 -m 256M -cdrom mu_linux.iso -boot d
+```

 The disk image also runs on [any cloud server that supports custom images](http://akkartik.name/post/iso-on-linode).

 Mu also runs on the minimal hobbyist OS [Soso](https://github.com/ozkl/soso).
 (Requires graphics and sudo access. Currently doesn't work on a cloud server.)

-  ```sh
-  $ ./translate_mu apps/ex2.mu  # emit a.elf
-  # dependencies
-  $ sudo apt install build-essential util-linux nasm xorriso  # maybe also dosfstools and mtools
-  $ tools/iso/soso a.elf  # requires sudo
-  $ qemu-system-i386 -cdrom mu_soso.iso
-  ```
+```sh
+$ ./translate_mu apps/ex2.mu  # emit a.elf
+# dependencies
+$ sudo apt install build-essential util-linux nasm xorriso  # maybe also dosfstools and mtools
+$ tools/iso/soso a.elf  # requires sudo
+$ qemu-system-i386 -cdrom mu_soso.iso
+```

 ## Syntax

@ -121,16 +121,16 @@ Here's an example program in Mu:

 Here's an example program in SubX:

-  ```sh
-  == code
-  Entry:
-    # ebx = 1
-    bb/copy-to-ebx  1/imm32
-    # increment ebx
-    43/increment-ebx
-    # exit(ebx)
-    e8/call  syscall_exit/disp32
-  ```
+```sh
+== code
+Entry:
+  # ebx = 1
+  bb/copy-to-ebx  1/imm32
+  # increment ebx
+  43/increment-ebx
+  # exit(ebx)
+  e8/call  syscall_exit/disp32
+```

 [More details on SubX syntax &rarr;](subx.md)

--- a/mu.md
+++ b/mu.md
@ -46,13 +46,13 @@ Zooming out from single statements, here's a complete sample program in Mu:

 Mu programs are lists of functions. Each function has the following form:

-  ```
-  fn _name_ _inout_ ... -> _output_ ... {
-    _statement_
-    _statement_
-    ...
-  }
-  ```
+```
+fn _name_ _inout_ ... -> _output_ ... {
+  _statement_
+  _statement_
+  ...
+}
+```

 Each function has a header line, and some number of statements, each on a
 separate line. Headers describe inouts and outputs. Inouts can't be registers,
@ -64,15 +64,15 @@ outputs in registers, and modify inouts passed in by reference. In addition,
 there's one more constraint: output registers must match the function header.
 For example:

-  ```
-  fn f -> x/eax: int {
-    ...
-  }
-  fn g {
-    a/eax <- f  # ok
-    a/ebx <- f  # wrong
-  }
-  ```
+```
+fn f -> x/eax: int {
+  ...
+}
+fn g {
+  a/eax <- f  # ok
+  a/ebx <- f  # wrong
+}
+```

 The function `main` is special; it is where the program starts running. It
 must always return a single int in register `ebx` (as the exit status of the
@ -92,23 +92,23 @@ and `}`, both each alone on a line.

 Blocks can nest:

-  ```
+```
+{
+  _statements_
  {
-    _statements_
-    {
-      _more statements_
-    }
+    _more statements_
  }
-  ```
+}
+```

 Blocks can be named (with the name ending in a `:` on the same line as the
 `{`):

-  ```
-  $name: {
-    _statements_
-  }
-  ```
+```
+$name: {
+  _statements_
+}
+```

 Further down we'll see primitive statements for skipping or repeating blocks.
 Besides control flow, the other use for blocks is...
@ -119,10 +119,10 @@ Functions can define new variables at any time with the keyword `var`. There
 are two variants of the `var` statement, for defining variables in registers
 or memory.

-  ```
-  var name: type
-  var name/reg: type <- ...
-  ```
+```
+var name: type
+var name/reg: type <- ...
+```

 Variables on the stack are never initialized. (They're always implicitly
 zeroed them out.) Variables in registers are always initialized.
@ -142,54 +142,54 @@ Here is the list of arithmetic primitive operations supported by Mu. The name
 `n` indicates a literal integer rather than a variable, and `var/reg` indicates
 a variable in a register.

-  ```
-  var/reg <- increment
-  increment var
-  var/reg <- decrement
-  decrement var
-  var1/reg1 <- add var2/reg2
-  var/reg <- add var2
-  add-to var1, var2/reg
-  var/reg <- add n
-  add-to var, n
+```
+var/reg <- increment
+increment var
+var/reg <- decrement
+decrement var
+var1/reg1 <- add var2/reg2
+var/reg <- add var2
+add-to var1, var2/reg
+var/reg <- add n
+add-to var, n

-  var1/reg1 <- sub var2/reg2
-  var/reg <- sub var2
-  sub-from var1, var2/reg
-  var/reg <- sub n
-  sub-from var, n
+var1/reg1 <- sub var2/reg2
+var/reg <- sub var2
+sub-from var1, var2/reg
+var/reg <- sub n
+sub-from var, n

-  var1/reg1 <- and var2/reg2
-  var/reg <- and var2
-  and-with var1, var2/reg
-  var/reg <- and n
-  and-with var, n
+var1/reg1 <- and var2/reg2
+var/reg <- and var2
+and-with var1, var2/reg
+var/reg <- and n
+and-with var, n

-  var1/reg1 <- or var2/reg2
-  var/reg <- or var2
-  or-with var1, var2/reg
-  var/reg <- or n
-  or-with var, n
+var1/reg1 <- or var2/reg2
+var/reg <- or var2
+or-with var1, var2/reg
+var/reg <- or n
+or-with var, n

-  var1/reg1 <- xor var2/reg2
-  var/reg <- xor var2
-  xor-with var1, var2/reg
-  var/reg <- xor n
-  xor-with var, n
+var1/reg1 <- xor var2/reg2
+var/reg <- xor var2
+xor-with var1, var2/reg
+var/reg <- xor n
+xor-with var, n

-  var/reg <- copy var2/reg2
-  copy-to var1, var2/reg
-  var/reg <- copy var2
-  var/reg <- copy n
-  copy-to var, n
+var/reg <- copy var2/reg2
+copy-to var1, var2/reg
+var/reg <- copy var2
+var/reg <- copy n
+copy-to var, n

-  compare var1, var2/reg
-  compare var1/reg, var2
-  compare var/eax, n
-  compare var, n
+compare var1, var2/reg
+compare var1/reg, var2
+compare var/eax, n
+compare var, n

-  var/reg <- multiply var2
-  ```
+var/reg <- multiply var2
+```

 Any statement above that takes a variable in memory can be replaced with a
 dereference (`*`) of an address variable (of type `(addr ...)`) in a register.
@ -211,11 +211,11 @@ Since most x86 instructions implicitly load 32 bits at a time from memory,
 variables of type 'byte' are only allowed in registers, not on the stack. Here
 are the possible statements for reading bytes to/from memory:

-  ```
-  var/reg <- copy-byte var2/reg2      # var: byte, var2: byte
-  var/reg <- copy-byte *var2/reg2     # var: byte, var2: (addr byte)
-  copy-byte-to *var1/reg1, var2/reg2  # var1: (addr byte), var2: byte
-  ```
+```
+var/reg <- copy-byte var2/reg2      # var: byte, var2: byte
+var/reg <- copy-byte *var2/reg2     # var: byte, var2: (addr byte)
+copy-byte-to *var1/reg1, var2/reg2  # var1: (addr byte), var2: byte
+```

 In addition, variables of type 'byte' are restricted to (the lowest bytes of)
 just 4 registers: eax, ecx, edx and ebx.
@ -228,9 +228,9 @@ jump to the beginning of the containing block.

 All jumps can take an optional label starting with '$':

-  ```
-  loop $foo
-  ```
+```
+loop $foo
+```

 This instruction jumps to the beginning of the block called $foo. The corresponding
 `break` jumps to the end of the block. Either jump statement must lie somewhere
@ -239,83 +239,83 @@ blocks with restraint; jumps to places far away can get confusing.)

 There are two unconditional jumps:

-  ```
-  loop
-  loop label
-  break
-  break label
-  ```
+```
+loop
+loop label
+break
+break label
+```

 The remaining jump instructions are all conditional. Conditional jumps rely on
 the result of the most recently executed `compare` instruction. (To keep
 programs easy to read, keep compare instructions close to the jump that uses
 them.)

-  ```
-  break-if-=
-  break-if-= label
-  break-if-!=
-  break-if-!= label
-  ```
+```
+break-if-=
+break-if-= label
+break-if-!=
+break-if-!= label
+```

 Inequalities are similar, but have unsigned and signed variants. For simplicity,
 always use signed integers; use the unsigned variants only to compare addresses.

-  ```
-  break-if-<
-  break-if-< label
-  break-if->
-  break-if-> label
-  break-if-<=
-  break-if-<= label
-  break-if->=
-  break-if->= label
+```
+break-if-<
+break-if-< label
+break-if->
+break-if-> label
+break-if-<=
+break-if-<= label
+break-if->=
+break-if->= label

-  break-if-addr<
-  break-if-addr< label
-  break-if-addr>
-  break-if-addr> label
-  break-if-addr<=
-  break-if-addr<= label
-  break-if-addr>=
-  break-if-addr>= label
-  ```
+break-if-addr<
+break-if-addr< label
+break-if-addr>
+break-if-addr> label
+break-if-addr<=
+break-if-addr<= label
+break-if-addr>=
+break-if-addr>= label
+```

 Similarly, conditional loops:

-  ```
-  loop-if-=
-  loop-if-= label
-  loop-if-!=
-  loop-if-!= label
+```
+loop-if-=
+loop-if-= label
+loop-if-!=
+loop-if-!= label

-  loop-if-<
-  loop-if-< label
-  loop-if->
-  loop-if-> label
-  loop-if-<=
-  loop-if-<= label
-  loop-if->=
-  loop-if->= label
+loop-if-<
+loop-if-< label
+loop-if->
+loop-if-> label
+loop-if-<=
+loop-if-<= label
+loop-if->=
+loop-if->= label

-  loop-if-addr<
-  loop-if-addr< label
-  loop-if-addr>
-  loop-if-addr> label
-  loop-if-addr<=
-  loop-if-addr<= label
-  loop-if-addr>=
-  loop-if-addr>= label
-  ```
+loop-if-addr<
+loop-if-addr< label
+loop-if-addr>
+loop-if-addr> label
+loop-if-addr<=
+loop-if-addr<= label
+loop-if-addr>=
+loop-if-addr>= label
+```

 ## Addresses

 Passing objects by reference requires the `address` operation, which returns
 an object of type `addr`.

-  ```
-  var/reg: (addr T) <- address var2: T
-  ```
+```
+var/reg: (addr T) <- address var2: T
+```

 Here `var2` can't live in a register.

@ -325,24 +325,24 @@ Mu arrays are size-prefixed so that operations on them can check bounds as
 necessary at run-time. The `length` statement returns the number of elements
 in an array.

-  ```
-  var/reg: int <- length arr/reg: (addr array T)
-  ```
+```
+var/reg: int <- length arr/reg: (addr array T)
+```

 The `index` statement takes an `addr` to an `array` and returns an `addr` to
 one of its elements, that can be read from or written to.

-  ```
-  var/reg: (addr T) <- index arr/reg: (addr array T), n
-  var/reg: (addr T) <- index arr: (array T sz), n
-  ```
+```
+var/reg: (addr T) <- index arr/reg: (addr array T), n
+var/reg: (addr T) <- index arr: (array T sz), n
+```

 The index can also be a variable in a register, with a caveat:

-  ```
-  var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: int
-  var/reg: (addr T) <- index arr: (array T sz), idx/reg: int
-  ```
+```
+var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: int
+var/reg: (addr T) <- index arr: (array T sz), idx/reg: int
+```

 The caveat: the size of T must be 1, 2, 4 or 8 bytes. The x86 instruction set
 has complex addressing modes that can index into an array in a single instruction
@ -351,30 +351,30 @@ in these situations.
 For types in general you'll need to split up the work, performing a `compute-offset`
 before the `index`.

-  ```
-  var/reg: (offset T) <- compute-offset arr: (addr array T), idx/reg: int     # arr can be in reg or mem
-  var/reg: (offset T) <- compute-offset arr: (addr array T), idx: int         # arr can be in reg or mem
-  ```
+```
+var/reg: (offset T) <- compute-offset arr: (addr array T), idx/reg: int     # arr can be in reg or mem
+var/reg: (offset T) <- compute-offset arr: (addr array T), idx: int         # arr can be in reg or mem
+```

 The `compute-offset` statement returns a value of type `(offset T)` after
 performing any necessary bounds checking. Now the offset can be passed to
 `index` as usual:

-  ```
-  var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: (offset T)
-  ```
+```
+var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: (offset T)
+```

 ## Compound types

 Primitive types can be combined together using the `type` keyword. For
 example:

-  ```
-  type point {
-    x: int
-    y: int
-  }
-  ```
+```
+type point {
+  x: int
+  y: int
+}
+```

 Mu programs are currently sequences of `fn` and `type` definitions.

@ -382,19 +382,19 @@ To access within a compound type, use the `get` instruction. There are two
 forms. You need either a variable of the type itself (say `T`) in memory, or a
 variable of type `(addr T)` in a register.

-  ```
-  var/reg: (addr T_f) <- get var/reg: (addr T), f
-  var/reg: (addr T_f) <- get var: T, f
-  ```
+```
+var/reg: (addr T_f) <- get var/reg: (addr T), f
+var/reg: (addr T_f) <- get var: T, f
+```

 The `f` here is the field name from the `type` definition, and `T_f` must
 match the type of `f` in the `type` definition. For example, some legal
 instructions for the definition of `point` above:

-  ```
-  var a/eax: (addr int) <- get p, x
-  var a/eax: (addr int) <- get p, y
-  ```
+```
+var a/eax: (addr int) <- get p, x
+var a/eax: (addr int) <- get p, y
+```

 ## Handles for safe access to the heap

@ -407,9 +407,9 @@ security issues or hard-to-debug misbehavior.
 To actually _use_ a `handle`, we have to turn it into an `addr` first using
 the `lookup` statement.

-  ```
-  var y/reg: (addr T) <- lookup x
-  ```
+```
+var y/reg: (addr T) <- lookup x
+```

 Now operate on the `addr` as usual, safe in the knowledge that you can later
 recover any writes to its payload from `x`.
@ -433,26 +433,26 @@ Try to avoid mixing these use cases.

 You can copy handles to another variable on the stack like this:

-  ```
-  var x: (handle T)
-  # ..some code initializing x..
-  var y/eax: (addr handle T) <- address ...
-  copy-handle x, y
-  ```
+```
+var x: (handle T)
+# ..some code initializing x..
+var y/eax: (addr handle T) <- address ...
+copy-handle x, y
+```

 You can also save handles inside compound types like this:

-  ```
-  var y/reg: (addr handle T_f) <- get var: (addr T), f
-  copy-handle-to *y, x
-  ```
+```
+var y/reg: (addr handle T_f) <- get var: (addr T), f
+copy-handle-to *y, x
+```

 Or this:

-  ```
-  var y/reg: (addr handle T) <- index arr: (addr array handle T), n
-  copy-handle-to *y, x
-  ```
+```
+var y/reg: (addr handle T) <- index arr: (addr array handle T), n
+copy-handle-to *y, x
+```

 ## Conclusion

--- a/subx.md
+++ b/subx.md
@ -6,16 +6,16 @@ is implemented in SubX and also emits SubX code.
 Here's an example program in SubX that adds 1 and 1 and returns the result to
 the parent shell process:

-  ```sh
-  == code
-  Entry:
-    # ebx = 1
-    bb/copy-to-ebx  1/imm32
-    # increment ebx
-    43/increment-ebx
-    # exit(ebx)
-    e8/call  syscall_exit/disp32
-  ```
+```sh
+== code
+Entry:
+  # ebx = 1
+  bb/copy-to-ebx  1/imm32
+  # increment ebx
+  43/increment-ebx
+  # exit(ebx)
+  e8/call  syscall_exit/disp32
+```

 ## The syntax of SubX instructions

@ -78,9 +78,9 @@ simpler. It comes from exactly one of the following argument types:
 Putting all this together, here's an example that adds the integer in `eax` to
 the one at address `edx`:

-  ```
-  01/add %edx 0/r32/eax
-  ```
+```
+01/add %edx 0/r32/eax
+```

 ## The syntax of SubX programs

--- a/subx_bare.md
+++ b/subx_bare.md
@ -41,9 +41,9 @@ contains `4`. Rather than encoding register `esp`, it means the address is
 provided by three _whole new_ arguments (`/base`, `/index` and `/scale`) in a
 _totally_ different way (where `<<` is the left-shift operator):

-  ```
-  reg/mem = *(base + (index << scale))
-  ```
+```
+reg/mem = *(base + (index << scale))
+```

 (There are a couple more exceptions ☹; see [Table 2-2](modrm.pdf) and [Table 2-3](sib.pdf)
 of the Intel manual for the complete story.)
@ -130,38 +130,38 @@ This repo includes two translators for bare SubX. The first is [the bootstrap
 translator](bootstrap.md) implemented in C++. In addition, you can use SubX to
 translate itself. For example, running natively on Linux:

-  ```sh
-  # generate translator phases using the C++ translator
-  $ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/hex.subx    -o hex
-  $ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/survey.subx -o survey
-  $ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/pack.subx   -o pack
-  $ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/assort.subx -o assort
-  $ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/dquotes.subx -o dquotes
-  $ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/tests.subx  -o tests
-  $ chmod +x hex survey pack assort dquotes tests
+```sh
+# generate translator phases using the C++ translator
+$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/hex.subx    -o hex
+$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/survey.subx -o survey
+$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/pack.subx   -o pack
+$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/assort.subx -o assort
+$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/dquotes.subx -o dquotes
+$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/tests.subx  -o tests
+$ chmod +x hex survey pack assort dquotes tests

-  # use the generated translator phases to translate SubX programs
-  $ cat init.linux apps/ex1.subx |./tests |./dquotes |./assort |./pack |./survey |./hex > a.elf
-  $ chmod +x a.elf
-  $ ./a.elf
-  $ echo $?
-  42
+# use the generated translator phases to translate SubX programs
+$ cat init.linux apps/ex1.subx |./tests |./dquotes |./assort |./pack |./survey |./hex > a.elf
+$ chmod +x a.elf
+$ ./a.elf
+$ echo $?
+42

-  # or, automating the above steps
-  $ ./translate_subx init.linux apps/ex1.subx
-  $ ./a.elf
-  $ echo $?
-  42
-  ```
+# or, automating the above steps
+$ ./translate_subx init.linux apps/ex1.subx
+$ ./a.elf
+$ echo $?
+42
+```

 Or, running in a VM on other platforms (much slower):

-  ```sh
-  $ ./translate_subx_emulated init.linux apps/ex1.subx  # generates identical a.elf to above
-  $ ./bootstrap run a.elf
-  $ echo $?
-  42
-  ```
+```sh
+$ ./translate_subx_emulated init.linux apps/ex1.subx  # generates identical a.elf to above
+$ ./bootstrap run a.elf
+$ echo $?
+42
+```

 ## Resources

--- a/subx_debugging.md
+++ b/subx_debugging.md
@ -25,20 +25,24 @@ rudimentary but hopefully still workable toolkit:

 - Generate a trace for the failing test while running your program in emulated
  mode (`bootstrap run`):
+
  ```
  $ ./bootstrap translate input.subx -o binary
  $ ./bootstrap --trace run binary arg1 arg2  2>trace
  ```
+
  The ability to generate a trace is the essential reason for the existence of
  `bootstrap run` mode. It gives far better visibility into program internals than
  running natively.

 - As a further refinement, it is possible to render label names in the trace
  by adding a second flag to the `bootstrap translate` command:
+
  ```
  $ ./bootstrap --debug translate input.subx -o binary
  $ ./bootstrap --trace run binary arg1 arg2  2>trace
  ```
+
  `bootstrap --debug translate` emits a mapping from label to address in a file
  called `labels`. `bootstrap --trace run` reads in the `labels` file if
  it exists and prints out any matching label name as it traces each instruction
@ -57,9 +61,11 @@ rudimentary but hopefully still workable toolkit:
  some loop. Function names get similar `run: == label` lines.

 - One trick when emitting traces with labels:
+
  ```
  $ grep label trace
  ```
+
  This is useful for quickly showing you the control flow for the run, and the
  function executing when the error occurred. I find it useful to start with
  this information, only looking at the complete trace after I've gotten