mu/subx
Kartik Agaram 4cc517e0de 4710
Start using write() instead of _write().. and we promptly find a typo when
dealing with real file descriptors.
2018-10-17 08:19:03 -07:00
..
apps 4710 2018-10-17 08:19:03 -07:00
examples 4698 2018-10-14 12:53:50 -07:00
000organization.cc 4426 - error on unrecognized sub-commands 2018-07-26 16:58:54 -07:00
001help.cc 4694 2018-10-13 23:55:07 -07:00
002test.cc 4426 - error on unrecognized sub-commands 2018-07-26 16:58:54 -07:00
003trace.cc 4656 2018-10-02 21:06:46 -07:00
003trace.test.cc 4487 2018-08-05 08:30:48 -07:00
010---vm.cc 4695 2018-10-14 00:00:39 -07:00
011run.cc 4696 2018-10-14 00:13:41 -07:00
012elf.cc 4661 2018-10-04 23:23:48 -07:00
013direct_addressing.cc 4695 2018-10-14 00:00:39 -07:00
014indirect_addressing.cc 4695 2018-10-14 00:00:39 -07:00
015immediate_addressing.cc 4695 2018-10-14 00:00:39 -07:00
016index_addressing.cc 4688 2018-10-12 23:41:43 -07:00
017jump_disp8.cc 4697 2018-10-14 00:29:48 -07:00
018jump_disp16.cc 4697 2018-10-14 00:29:48 -07:00
019functions.cc 4695 2018-10-14 00:00:39 -07:00
020syscalls.cc 4695 2018-10-14 00:00:39 -07:00
021byte_addressing.cc 4695 2018-10-14 00:00:39 -07:00
028translate.cc 4678 2018-10-10 20:54:15 -07:00
029transforms.cc 4519 2018-09-26 10:35:06 -07:00
030---operands.cc 4694 2018-10-13 23:55:07 -07:00
031check_operands.cc 4697 2018-10-14 00:29:48 -07:00
032check_operand_bounds.cc 4694 2018-10-13 23:55:07 -07:00
034compute_segment_address.cc 4668 2018-10-05 21:30:22 -07:00
035labels.cc 4678 2018-10-10 20:54:15 -07:00
036global_variables.cc 4678 2018-10-10 20:54:15 -07:00
038---literal_strings.cc 4668 2018-10-05 21:30:22 -07:00
039debug.cc 4678 2018-10-10 20:54:15 -07:00
040---tests.cc 4667 2018-10-05 19:49:47 -07:00
050_write.subx 4705 2018-10-16 23:27:38 -07:00
051test.subx 4705 2018-10-16 23:27:38 -07:00
052kernel_string_equal.subx 4705 2018-10-16 23:27:38 -07:00
053new_segment.subx 4668 2018-10-05 21:30:22 -07:00
054string_equal.subx 4668 2018-10-05 21:30:22 -07:00
055trace.subx 4708 2018-10-17 06:47:28 -07:00
056write.subx 4710 2018-10-17 08:19:03 -07:00
100index 4499 2018-08-09 21:46:12 -07:00
Readme.md 4638 - extract some common libraries from apps 2018-10-01 12:27:39 -07:00
build 4403 2018-07-25 13:07:01 -07:00
build_and_test_until 4457 2018-07-30 11:31:09 -07:00
cheatsheet.pdf 4026 2017-10-12 09:36:55 -07:00
clean 4462 2018-07-30 20:28:36 -07:00
edit 4568 2018-09-21 13:51:43 -07:00
gen 4638 - extract some common libraries from apps 2018-10-01 12:27:39 -07:00
modrm.pdf 4692 - update online help for subx 2018-10-13 23:18:31 -07:00
opcodes 4671 2018-10-07 12:45:03 -07:00
run 4574 2018-09-21 15:07:42 -07:00
sib.pdf 4692 - update online help for subx 2018-10-13 23:18:31 -07:00
subx 4211 2018-02-20 01:38:15 -08:00
subx.vim 4667 2018-10-05 19:49:47 -07:00
test_apps 4684 2018-10-11 00:08:50 -07:00
test_layers 4682 - subx: start testing all layers of 'library' 2018-10-10 22:22:48 -07:00
vimrc.vim 4020 2017-10-11 02:32:38 -07:00

Readme.md

What is this?

SubX is a thin layer of syntactic sugar over (32-bit x86) machine code. The SubX translator (it's too simple to be called a compiler, or even an assembler) generates ELF binaries that require just a Unix-like kernel to run. (The translator isn't self-hosted yet; generating the binaries does require a C++ compiler and runtime.)

Thin layer of abstraction over machine code, isn't that just an assembler?

Compare some code in Assembly:

add EBX, ECX
copy EBX, 0
copy ECX, 1

..with the same instructions in SubX:

01/add 3/mod/direct 3/rm32/ebx 1/r32/ecx
bb/copy-EBX 0/imm32
b9/copy-ECX 1/imm32

Assembly is pretty low-level, but SubX makes Assembly look like the gleaming chrome of the Starship Enterprise. Opcodes for instructions are explicit, as are addressing modes and the precise bit fields used to encode them. There is no portability. Only a subset of x86 is supported, so there's no backwards compatibility either, zero interoperability with existing libraries. Only statically linked libraries are supported, so the kernel will inefficiently juggle multiple copies of the same libraries in RAM.

In exchange for these drawbacks, SubX will hopefully be simpler to implement. Ideally in itself.

I'm also hoping that SubX will be simpler to program in, that it will fit a programmer's head better in spite of the lack of syntax. Modern Assembly supports 50+ years of accretions in the x86 ISA and 40+ years of accumulated cruft in the toolchain (standard library, ELF format, binutils, linker, loader).

You may say I just don't understand the toolchain well enough. And that's the point. I tried, and I failed. Each package above has only a piece of the puzzle. Learning each of the above tools takes time; figuring out how they all work together is not a well-supported activity.

My hypothesis is that it's easier to understand a coherent system written in machine code than an incoherent system in a high-level language. To test this hypothesis, I plan to take a hatchet to anything I don't understand, but to take full ownership of what's left. Not just how it runs, but the experience of programming with it. A few basic mechanisms can hopefully be put together into a more self-explanatory system:

a) Metadata. In the example above, words after a slash (/) act as metadata that doesn't make it into the final binary. Metadata can act as comments for readers, or as directives for tools operating on SubX code. Programmers will be encouraged to create new tools of their own.

b) Checks. While SubX doesn't provide syntax, it tries to provide good guardrails for invalid programs. Metadata specifies which field of an instruction each operand is intended for. Missing operands are caught before they can silently mislead instruction decoding. Instructions with unexpected operand types are immediately flagged. SubX includes an emulator for a subset of x86, which provides better error messages than native execution for certain kinds of bad binaries.

c) A test harness. SubX includes automated tests from the start, and the entire stack is designed to be easy to test. We will provide wrappers for OS syscalls that allow fakes to be dependency-injected in, expanding the kinds of tests that can be written. See the earlier Mu interpreter for more details.

d) Traces of execution. Writing good error messages for a compiler is a hard problem, and it can add complexity. We'd like to keep things ergonomic with a minimum of code, so we will provide a trace browser that allows programmers to scan the trace of events emitted by SubX leading up to an error message, drilling down into details as needed. Traces will also be available in tests, enabling testing for cross-cutting concerns like performance, race conditions, precise error messages displayed on screen, and so on. The effect is again to expand the kinds of tests that can be written. More details.

e) Incremental construction. SubX programs are translated into monolithic ELF binaries, but you will be able to build just a subset of their code (denominated in layers), and get a running program that passes all its automated tests. More details.

It seems wrong-headed that our computers look polished but are plagued by foundational problems of security and reliability. I'd like to learn to walk before I try to run. The plan: start out using the computer only to check my program for errors rather than to hide low-level details. Force myself to think about security by living with raw machine code for a while. Reintroduce high level languages (HLLs) only after confidence is regained in the foundations (and when the foundations are ergonomic enough to support developing a compiler in them). Delegate only when I can verify with confidence.

Running

$ git clone https://github.com/akkartik/mu
$ cd mu/subx
$ ./subx

Running subx will transparently compile it as necessary.

Usage

subx currently has the following sub-commands:

  • subx test: runs all automated tests.

  • subx translate <input file> -o <output ELF binary>: translates a text file containing hex bytes and macros into an executable ELF binary.

  • subx run <ELF binary>: simulates running the ELF binaries emitted by subx translate. Useful for debugging, and also enables more thorough testing of translate.

Putting them together, build and run one of the example programs:

apps/factorial.subx
$ ./subx translate *.subx apps/factorial.subx -o apps/factorial
$ ./subx run apps/factorial  # returns the factorial of 5
$ echo $?
120  

If you're running on Linux, factorial will also be runnable directly:

$ apps/factorial

The examples/ directory shows some simpler programs giving a more gradual introduction to SubX features. The repo includes the binary for all examples. At any commit an example's binary should be identical bit for bit with the result of translating the .subx file. The binary should also be natively runnable on a 32-bit Linux system. If either of these invariants is broken it's a bug on my part. The binary should also be runnable on a 64-bit Linux system. I can't guarantee it, but I'd appreciate hearing if it doesn't run.

However, not all 32-bit Linux binaries are guaranteed to be runnable by subx. I'm not building general infrastructure here for all of the x86 ISA and ELF format. SubX is about programming with a small, regular subset of 32-bit x86:

  • Only instructions that operate on the 32-bit integer E*X registers. (No floating-point yet.)
  • Only instructions that assume a flat address space; no instructions that use segment registers.
  • No instructions that check the carry or parity flags; arithmetic operations always operate on signed integers (while bitwise operations always operate on unsigned integers)
  • Only relative jump instructions (with 8-bit or 16-bit offsets).

The ELF binaries generated are statically linked and missing a lot of advanced ELF features as well. But they will run.

For more details on programming in this subset, consult the online help:

$ ./subx help

Resources

Inspirations