mu/subx at 4cc517e0de7c7759196528bbf1b89f6b9624fe71 - mu

History

Kartik Agaram 4cc517e0de 4710 Start using write() instead of _write().. and we promptly find a typo when dealing with real file descriptors.		2018-10-17 08:19:03 -07:00
..
apps	4710	2018-10-17 08:19:03 -07:00
examples	4698	2018-10-14 12:53:50 -07:00
000organization.cc	4426 - error on unrecognized sub-commands	2018-07-26 16:58:54 -07:00
001help.cc	4694	2018-10-13 23:55:07 -07:00
002test.cc	4426 - error on unrecognized sub-commands	2018-07-26 16:58:54 -07:00
003trace.cc	4656	2018-10-02 21:06:46 -07:00
003trace.test.cc	4487	2018-08-05 08:30:48 -07:00
010---vm.cc	4695	2018-10-14 00:00:39 -07:00
011run.cc	4696	2018-10-14 00:13:41 -07:00
012elf.cc	4661	2018-10-04 23:23:48 -07:00
013direct_addressing.cc	4695	2018-10-14 00:00:39 -07:00
014indirect_addressing.cc	4695	2018-10-14 00:00:39 -07:00
015immediate_addressing.cc	4695	2018-10-14 00:00:39 -07:00
016index_addressing.cc	4688	2018-10-12 23:41:43 -07:00
017jump_disp8.cc	4697	2018-10-14 00:29:48 -07:00
018jump_disp16.cc	4697	2018-10-14 00:29:48 -07:00
019functions.cc	4695	2018-10-14 00:00:39 -07:00
020syscalls.cc	4695	2018-10-14 00:00:39 -07:00
021byte_addressing.cc	4695	2018-10-14 00:00:39 -07:00
028translate.cc	4678	2018-10-10 20:54:15 -07:00
029transforms.cc	4519	2018-09-26 10:35:06 -07:00
030---operands.cc	4694	2018-10-13 23:55:07 -07:00
031check_operands.cc	4697	2018-10-14 00:29:48 -07:00
032check_operand_bounds.cc	4694	2018-10-13 23:55:07 -07:00
034compute_segment_address.cc	4668	2018-10-05 21:30:22 -07:00
035labels.cc	4678	2018-10-10 20:54:15 -07:00
036global_variables.cc	4678	2018-10-10 20:54:15 -07:00
038---literal_strings.cc	4668	2018-10-05 21:30:22 -07:00
039debug.cc	4678	2018-10-10 20:54:15 -07:00
040---tests.cc	4667	2018-10-05 19:49:47 -07:00
050_write.subx	4705	2018-10-16 23:27:38 -07:00
051test.subx	4705	2018-10-16 23:27:38 -07:00
052kernel_string_equal.subx	4705	2018-10-16 23:27:38 -07:00
053new_segment.subx	4668	2018-10-05 21:30:22 -07:00
054string_equal.subx	4668	2018-10-05 21:30:22 -07:00
055trace.subx	4708	2018-10-17 06:47:28 -07:00
056write.subx	4710	2018-10-17 08:19:03 -07:00
100index	4499	2018-08-09 21:46:12 -07:00
Readme.md	4638 - extract some common libraries from apps	2018-10-01 12:27:39 -07:00
build	4403	2018-07-25 13:07:01 -07:00
build_and_test_until	4457	2018-07-30 11:31:09 -07:00
cheatsheet.pdf	4026	2017-10-12 09:36:55 -07:00
clean	4462	2018-07-30 20:28:36 -07:00
edit	4568	2018-09-21 13:51:43 -07:00
gen	4638 - extract some common libraries from apps	2018-10-01 12:27:39 -07:00
modrm.pdf	4692 - update online help for subx	2018-10-13 23:18:31 -07:00
opcodes	4671	2018-10-07 12:45:03 -07:00
run	4574	2018-09-21 15:07:42 -07:00
sib.pdf	4692 - update online help for subx	2018-10-13 23:18:31 -07:00
subx	4211	2018-02-20 01:38:15 -08:00
subx.vim	4667	2018-10-05 19:49:47 -07:00
test_apps	4684	2018-10-11 00:08:50 -07:00
test_layers	4682 - subx: start testing all layers of 'library'	2018-10-10 22:22:48 -07:00
vimrc.vim	4020	2017-10-11 02:32:38 -07:00

Readme.md

What is this?

SubX is a thin layer of syntactic sugar over (32-bit x86) machine code. The SubX translator (it's too simple to be called a compiler, or even an assembler) generates ELF binaries that require just a Unix-like kernel to run. (The translator isn't self-hosted yet; generating the binaries does require a C++ compiler and runtime.)

Thin layer of abstraction over machine code, isn't that just an assembler?

Compare some code in Assembly:

add EBX, ECX
copy EBX, 0
copy ECX, 1

..with the same instructions in SubX:

01/add 3/mod/direct 3/rm32/ebx 1/r32/ecx
bb/copy-EBX 0/imm32
b9/copy-ECX 1/imm32

Assembly is pretty low-level, but SubX makes Assembly look like the gleaming chrome of the Starship Enterprise. Opcodes for instructions are explicit, as are addressing modes and the precise bit fields used to encode them. There is no portability. Only a subset of x86 is supported, so there's no backwards compatibility either, zero interoperability with existing libraries. Only statically linked libraries are supported, so the kernel will inefficiently juggle multiple copies of the same libraries in RAM.

In exchange for these drawbacks, SubX will hopefully be simpler to implement. Ideally in itself.

I'm also hoping that SubX will be simpler to program in, that it will fit a programmer's head better in spite of the lack of syntax. Modern Assembly supports 50+ years of accretions in the x86 ISA and 40+ years of accumulated cruft in the toolchain (standard library, ELF format, binutils, linker, loader).

You may say I just don't understand the toolchain well enough. And that's the point. I tried, and I failed. Each package above has only a piece of the puzzle. Learning each of the above tools takes time; figuring out how they all work together is not a well-supported activity.

My hypothesis is that it's easier to understand a coherent system written in machine code than an incoherent system in a high-level language. To test this hypothesis, I plan to take a hatchet to anything I don't understand, but to take full ownership of what's left. Not just how it runs, but the experience of programming with it. A few basic mechanisms can hopefully be put together into a more self-explanatory system:

a) Metadata. In the example above, words after a slash (/) act as metadata that doesn't make it into the final binary. Metadata can act as comments for readers, or as directives for tools operating on SubX code. Programmers will be encouraged to create new tools of their own.

b) Checks. While SubX doesn't provide syntax, it tries to provide good guardrails for invalid programs. Metadata specifies which field of an instruction each operand is intended for. Missing operands are caught before they can silently mislead instruction decoding. Instructions with unexpected operand types are immediately flagged. SubX includes an emulator for a subset of x86, which provides better error messages than native execution for certain kinds of bad binaries.

c) A test harness. SubX includes automated tests from the start, and the entire stack is designed to be easy to test. We will provide wrappers for OS syscalls that allow fakes to be dependency-injected in, expanding the kinds of tests that can be written. See the earlier Mu interpreter for more details.

d) Traces of execution. Writing good error messages for a compiler is a hard problem, and it can add complexity. We'd like to keep things ergonomic with a minimum of code, so we will provide a trace browser that allows programmers to scan the trace of events emitted by SubX leading up to an error message, drilling down into details as needed. Traces will also be available in tests, enabling testing for cross-cutting concerns like performance, race conditions, precise error messages displayed on screen, and so on. The effect is again to expand the kinds of tests that can be written. More details.

e) Incremental construction. SubX programs are translated into monolithic ELF binaries, but you will be able to build just a subset of their code (denominated in layers), and get a running program that passes all its automated tests. More details.

It seems wrong-headed that our computers look polished but are plagued by foundational problems of security and reliability. I'd like to learn to walk before I try to run. The plan: start out using the computer only to check my program for errors rather than to hide low-level details. Force myself to think about security by living with raw machine code for a while. Reintroduce high level languages (HLLs) only after confidence is regained in the foundations (and when the foundations are ergonomic enough to support developing a compiler in them). Delegate only when I can verify with confidence.

Running

$ git clone https://github.com/akkartik/mu
$ cd mu/subx
$ ./subx

Running subx will transparently compile it as necessary.

Usage

subx currently has the following sub-commands:

subx test: runs all automated tests.
subx translate <input file> -o <output ELF binary>: translates a text file containing hex bytes and macros into an executable ELF binary.
subx run <ELF binary>: simulates running the ELF binaries emitted by subx translate. Useful for debugging, and also enables more thorough testing of translate.

Putting them together, build and run one of the example programs:

$ ./subx translate *.subx apps/factorial.subx -o apps/factorial
$ ./subx run apps/factorial  # returns the factorial of 5
$ echo $?
120

If you're running on Linux, factorial will also be runnable directly:

$ apps/factorial

The examples/ directory shows some simpler programs giving a more gradual introduction to SubX features. The repo includes the binary for all examples. At any commit an example's binary should be identical bit for bit with the result of translating the .subx file. The binary should also be natively runnable on a 32-bit Linux system. If either of these invariants is broken it's a bug on my part. The binary should also be runnable on a 64-bit Linux system. I can't guarantee it, but I'd appreciate hearing if it doesn't run.

However, not all 32-bit Linux binaries are guaranteed to be runnable by subx. I'm not building general infrastructure here for all of the x86 ISA and ELF format. SubX is about programming with a small, regular subset of 32-bit x86:

Only instructions that operate on the 32-bit integer E*X registers. (No floating-point yet.)
Only instructions that assume a flat address space; no instructions that use segment registers.
No instructions that check the carry or parity flags; arithmetic operations always operate on signed integers (while bitwise operations always operate on unsigned integers)
Only relative jump instructions (with 8-bit or 16-bit offsets).

The ELF binaries generated are statically linked and missing a lot of advanced ELF features as well. But they will run.

For more details on programming in this subset, consult the online help:

$ ./subx help

Resources

Inspirations

“Creating tiny ELF executables”
“Bootstrapping a compiler from nothing”
Forth implementations like StoneKnifeForth