The final fix to the raytracing program involves rounding modes. It turns
out x86 processors round floats by default, unlike C which has trained
me to expect truncation. Rather than mess with the MXCSR register, I added
another instruction for truncation. Now milestone 3 emits perfectly correct
results.
I spent some time deciding on the instructions. x87 is a stack ISA, so
not a good fit for the rest of SubX. So we use SSE instead. They operate
on 32-bit floats, which seems like a good fit.
SSE has a bunch of instructions for operating on up to 4 floats at once.
We'll ignore all that and just focus on so-called scalar instructions.
This is a 3-operand instruction:
r32 = rm32 * imm32
It looks like https://c9x.me/x86/html/file_module_x86_id_138.html has a
bug, implying the same opcode supports a 2-operand version. I don't see
that in the Intel manual pdf, or at alternative sites like https://www.felixcloutier.com/x86/imul
Native runs seem to validate my understanding.
In the process I also fixed a bug in the existing multiply instruction
0f af: the only flags it sets are OF and CF. The other existing multiply
instruction f7 was doing things right.