5863

Just clarified for myself why `subx translate` and `subx run` need to share code: emulation supports the tests first and foremost. In the process we clean up our architecture for levels of layers. It's a good idea but unused once we reconceive of "level 1" as just part of the test harness.
2020-01-02 01:28:24 -08:00 · 2020-01-02 01:28:24 -08:00 · d02aa9ac0b
parent 01013f2ad2
commit d02aa9ac0b
5 changed files with 25 additions and 102 deletions
--- a/010---vm.cc
+++ b/010---vm.cc
@ -1,11 +1,5 @@
-//: Core data structures for simulating the SubX VM (subset of an x86 processor)
-//:
-//: At the lowest level ("level 1") of abstraction, SubX executes x86
-//: instructions provided in the form of an array of bytes, loaded into memory
-//: starting at a specific address.
-//:
-//: SubX is fundamentally a translator. But having a VM to execute its
-//: translations affords greater confidence in it.
+//: Core data structures for simulating the SubX VM (subset of an x86 processor),
+//: either in tests or debug aids.

 //:: registers
 //: assume segment registers are hard-coded to 0
--- a/011run.cc
+++ b/011run.cc
@ -78,15 +78,14 @@ void test_copy_imm32_to_EAX() {
  );
 }

-// top-level helper for scenarios: parse the input, transform any macros, load
-// the final hex bytes into memory, run it
+// top-level helper for tests: parse the input, load the hex bytes into memory, run
 void run(const string& text_bytes) {
  program p;
  istringstream in(text_bytes);
+  // Loading Test Program
  parse(in, p);
  if (trace_contains_errors()) return;  // if any stage raises errors, stop immediately
-  transform(p);
-  if (trace_contains_errors()) return;
+  // Running Test Program
  load(p);
  if (trace_contains_errors()) return;
  // convenience to keep tests concise: 'Entry' label need not be provided
@ -244,19 +243,6 @@ void test_detect_duplicate_segments() {
  );
 }

-//:: transform
-
-:(before "End Types")
-typedef void (*transform_fn)(program&);
-:(before "End Globals")
-vector<transform_fn> Transform;
-
-:(code)
-void transform(program& p) {
-  for (int t = 0;  t < SIZE(Transform);  ++t)
-    (*Transform.at(t))(p);
-}
-
 //:: load

 void load(const program& p) {
--- a/030---translate.cc
+++ b/030---translate.cc
@ -1,20 +1,9 @@
-//: The bedrock level 1 of abstraction is now done, and we're going to start
-//: building levels above it that make programming in x86 machine code a
-//: little more ergonomic.
-//:
-//: All levels will be "pass through by default". Whatever they don't
-//: understand they will silently pass through to lower levels.
-//:
-//: Since raw hex bytes of machine code are always possible to inject, SubX is
-//: not a language, and we aren't building a compiler. This is something
-//: deliberately leakier. Levels are more for improving auditing, checks and
-//: error messages rather than for hiding low-level details.
+//: After that lengthy prelude to define an x86 emulator, we are now ready to
+//: start translating SubX notation.

 //: Translator workflow: read 'source' file. Run a series of transforms on it,
 //: each passing through what it doesn't understand. The final program should
-//: be just machine code, suitable to write to an ELF binary.
-//:
-//: Higher levels usually transform code on the basis of metadata.
+//: be just machine code, suitable to emulate, or to write to an ELF binary.

 :(before "End Main")
 if (is_equal(argv[1], "translate")) {
@ -69,6 +58,10 @@ if (is_equal(argv[1], "translate")) {
 }

 :(code)
+void transform(program& p) {
+  // End transform(program& p)
+}
+
 void print_translate_usage() {
  cerr << "Usage: subx translate file1 file2 ... -o output\n";
 }
--- a/031transforms.cc
+++ b/031transforms.cc
@ -1,64 +1,11 @@
-//: Ordering transforms is a well-known hard problem when building compilers.
-//: In our case we also have the additional notion of layers. The ordering of
-//: layers can have nothing in common with the ordering of transforms when
-//: SubX is tangled and run. This can be confusing for readers, particularly
-//: if later layers start inserting transforms at arbitrary points between
-//: transforms introduced earlier. Over time adding transforms can get harder
-//: and harder, having to meet the constraints of everything that's come
-//: before. It's worth thinking about organization up-front so the ordering is
-//: easy to hold in our heads, and it's obvious where to add a new transform.
-//: Some constraints:
-//:
-//:   1. Layers force us to build SubX bottom-up; since we want to be able to
-//:   build and run SubX after stopping loading at any layer, the overall
-//:   organization has to be to introduce primitives before we start using
-//:   them.
-//:
-//:   2. Transforms usually need to be run top-down, converting high-level
-//:   representations to low-level ones so that low-level layers can be
-//:   oblivious to them.
-//:
-//:   3. When running we'd often like new representations to be checked before
-//:   they are transformed away. The whole reason for new representations is
-//:   often to add new kinds of automatic checking for our machine code
-//:   programs.
-//:
-//: Putting these constraints together, we'll use the following broad
-//: organization:
-//:
-//:   a) We'll divide up our transforms into "levels", each level consisting
-//:   of multiple transforms, and dealing in some new set of representational
-//:   ideas. Levels will be added in reverse order to the one their transforms
-//:   will be run in.
-//:
-//:     To run all transforms:
-//:       Load transforms for level n
-//:       Load transforms for level n-1
-//:       ...
-//:       Load transforms for level 2
-//:       Run code at level 1
-//:
-//:   b) *Within* a level we'll usually introduce transforms in the order
-//:   they're run in.
-//:
-//:     To run transforms for level n:
-//:       Perform transform of layer l
-//:       Perform transform of layer l+1
-//:       ...
-//:
-//:   c) Within a level it's often most natural to introduce a new
-//:   representation by showing how it's transformed to the level below. To
-//:   make such exceptions more obvious checks usually won't be first-class
-//:   transforms; instead code that keeps the program unmodified will run
-//:   within transforms before they mutate the program. As an example:
-//:
-//:     Layer l introduces a transform
-//:     Layer l+1 adds precondition checks for the transform
-//:
-//: This may all seem abstract, but will hopefully make sense over time. The
-//: goals are basically to always have a working program after any layer, to
-//: have the order of layers make narrative sense, and to order transforms
-//: correctly at runtime.
+:(before "End Types")
+typedef void (*transform_fn)(program&);
+:(before "End Globals")
+vector<transform_fn> Transform;
+
+:(before "End transform(program& p)")
+for (int t = 0;  t < SIZE(Transform);  ++t)
+  (*Transform.at(t))(p);

 :(before "End One-time Setup")
 // Begin Transforms
--- a/032---operands.cc
+++ b/032---operands.cc
@ -1,5 +1,4 @@
-//: Beginning of "level 2": tagging bytes with metadata around what field of
-//: an x86 instruction they're for.
+//: Metadata for fields of an x86 instruction.
 //:
 //: The x86 instruction set is variable-length, and how a byte is interpreted
 //: affects later instruction boundaries. A lot of the pain in programming
@ -27,6 +26,10 @@ put_new(Help, "instructions",
 :(before "End Help Contents")
 cerr << "  instructions\n";

+:(before "Running Test Program")
+transform(p);
+if (trace_contains_errors()) return;
+
 :(code)
 void test_pack_immediate_constants() {
  run(