5863
Just clarified for myself why `subx translate` and `subx run` need to share code: emulation supports the tests first and foremost. In the process we clean up our architecture for levels of layers. It's a good idea but unused once we reconceive of "level 1" as just part of the test harness.
This commit is contained in:
parent
01013f2ad2
commit
d02aa9ac0b
10
010---vm.cc
10
010---vm.cc
|
@ -1,11 +1,5 @@
|
|||
//: Core data structures for simulating the SubX VM (subset of an x86 processor)
|
||||
//:
|
||||
//: At the lowest level ("level 1") of abstraction, SubX executes x86
|
||||
//: instructions provided in the form of an array of bytes, loaded into memory
|
||||
//: starting at a specific address.
|
||||
//:
|
||||
//: SubX is fundamentally a translator. But having a VM to execute its
|
||||
//: translations affords greater confidence in it.
|
||||
//: Core data structures for simulating the SubX VM (subset of an x86 processor),
|
||||
//: either in tests or debug aids.
|
||||
|
||||
//:: registers
|
||||
//: assume segment registers are hard-coded to 0
|
||||
|
|
20
011run.cc
20
011run.cc
|
@ -78,15 +78,14 @@ void test_copy_imm32_to_EAX() {
|
|||
);
|
||||
}
|
||||
|
||||
// top-level helper for scenarios: parse the input, transform any macros, load
|
||||
// the final hex bytes into memory, run it
|
||||
// top-level helper for tests: parse the input, load the hex bytes into memory, run
|
||||
void run(const string& text_bytes) {
|
||||
program p;
|
||||
istringstream in(text_bytes);
|
||||
// Loading Test Program
|
||||
parse(in, p);
|
||||
if (trace_contains_errors()) return; // if any stage raises errors, stop immediately
|
||||
transform(p);
|
||||
if (trace_contains_errors()) return;
|
||||
// Running Test Program
|
||||
load(p);
|
||||
if (trace_contains_errors()) return;
|
||||
// convenience to keep tests concise: 'Entry' label need not be provided
|
||||
|
@ -244,19 +243,6 @@ void test_detect_duplicate_segments() {
|
|||
);
|
||||
}
|
||||
|
||||
//:: transform
|
||||
|
||||
:(before "End Types")
|
||||
typedef void (*transform_fn)(program&);
|
||||
:(before "End Globals")
|
||||
vector<transform_fn> Transform;
|
||||
|
||||
:(code)
|
||||
void transform(program& p) {
|
||||
for (int t = 0; t < SIZE(Transform); ++t)
|
||||
(*Transform.at(t))(p);
|
||||
}
|
||||
|
||||
//:: load
|
||||
|
||||
void load(const program& p) {
|
||||
|
|
|
@ -1,20 +1,9 @@
|
|||
//: The bedrock level 1 of abstraction is now done, and we're going to start
|
||||
//: building levels above it that make programming in x86 machine code a
|
||||
//: little more ergonomic.
|
||||
//:
|
||||
//: All levels will be "pass through by default". Whatever they don't
|
||||
//: understand they will silently pass through to lower levels.
|
||||
//:
|
||||
//: Since raw hex bytes of machine code are always possible to inject, SubX is
|
||||
//: not a language, and we aren't building a compiler. This is something
|
||||
//: deliberately leakier. Levels are more for improving auditing, checks and
|
||||
//: error messages rather than for hiding low-level details.
|
||||
//: After that lengthy prelude to define an x86 emulator, we are now ready to
|
||||
//: start translating SubX notation.
|
||||
|
||||
//: Translator workflow: read 'source' file. Run a series of transforms on it,
|
||||
//: each passing through what it doesn't understand. The final program should
|
||||
//: be just machine code, suitable to write to an ELF binary.
|
||||
//:
|
||||
//: Higher levels usually transform code on the basis of metadata.
|
||||
//: be just machine code, suitable to emulate, or to write to an ELF binary.
|
||||
|
||||
:(before "End Main")
|
||||
if (is_equal(argv[1], "translate")) {
|
||||
|
@ -69,6 +58,10 @@ if (is_equal(argv[1], "translate")) {
|
|||
}
|
||||
|
||||
:(code)
|
||||
void transform(program& p) {
|
||||
// End transform(program& p)
|
||||
}
|
||||
|
||||
void print_translate_usage() {
|
||||
cerr << "Usage: subx translate file1 file2 ... -o output\n";
|
||||
}
|
||||
|
|
|
@ -1,64 +1,11 @@
|
|||
//: Ordering transforms is a well-known hard problem when building compilers.
|
||||
//: In our case we also have the additional notion of layers. The ordering of
|
||||
//: layers can have nothing in common with the ordering of transforms when
|
||||
//: SubX is tangled and run. This can be confusing for readers, particularly
|
||||
//: if later layers start inserting transforms at arbitrary points between
|
||||
//: transforms introduced earlier. Over time adding transforms can get harder
|
||||
//: and harder, having to meet the constraints of everything that's come
|
||||
//: before. It's worth thinking about organization up-front so the ordering is
|
||||
//: easy to hold in our heads, and it's obvious where to add a new transform.
|
||||
//: Some constraints:
|
||||
//:
|
||||
//: 1. Layers force us to build SubX bottom-up; since we want to be able to
|
||||
//: build and run SubX after stopping loading at any layer, the overall
|
||||
//: organization has to be to introduce primitives before we start using
|
||||
//: them.
|
||||
//:
|
||||
//: 2. Transforms usually need to be run top-down, converting high-level
|
||||
//: representations to low-level ones so that low-level layers can be
|
||||
//: oblivious to them.
|
||||
//:
|
||||
//: 3. When running we'd often like new representations to be checked before
|
||||
//: they are transformed away. The whole reason for new representations is
|
||||
//: often to add new kinds of automatic checking for our machine code
|
||||
//: programs.
|
||||
//:
|
||||
//: Putting these constraints together, we'll use the following broad
|
||||
//: organization:
|
||||
//:
|
||||
//: a) We'll divide up our transforms into "levels", each level consisting
|
||||
//: of multiple transforms, and dealing in some new set of representational
|
||||
//: ideas. Levels will be added in reverse order to the one their transforms
|
||||
//: will be run in.
|
||||
//:
|
||||
//: To run all transforms:
|
||||
//: Load transforms for level n
|
||||
//: Load transforms for level n-1
|
||||
//: ...
|
||||
//: Load transforms for level 2
|
||||
//: Run code at level 1
|
||||
//:
|
||||
//: b) *Within* a level we'll usually introduce transforms in the order
|
||||
//: they're run in.
|
||||
//:
|
||||
//: To run transforms for level n:
|
||||
//: Perform transform of layer l
|
||||
//: Perform transform of layer l+1
|
||||
//: ...
|
||||
//:
|
||||
//: c) Within a level it's often most natural to introduce a new
|
||||
//: representation by showing how it's transformed to the level below. To
|
||||
//: make such exceptions more obvious checks usually won't be first-class
|
||||
//: transforms; instead code that keeps the program unmodified will run
|
||||
//: within transforms before they mutate the program. As an example:
|
||||
//:
|
||||
//: Layer l introduces a transform
|
||||
//: Layer l+1 adds precondition checks for the transform
|
||||
//:
|
||||
//: This may all seem abstract, but will hopefully make sense over time. The
|
||||
//: goals are basically to always have a working program after any layer, to
|
||||
//: have the order of layers make narrative sense, and to order transforms
|
||||
//: correctly at runtime.
|
||||
:(before "End Types")
|
||||
typedef void (*transform_fn)(program&);
|
||||
:(before "End Globals")
|
||||
vector<transform_fn> Transform;
|
||||
|
||||
:(before "End transform(program& p)")
|
||||
for (int t = 0; t < SIZE(Transform); ++t)
|
||||
(*Transform.at(t))(p);
|
||||
|
||||
:(before "End One-time Setup")
|
||||
// Begin Transforms
|
||||
|
|
|
@ -1,5 +1,4 @@
|
|||
//: Beginning of "level 2": tagging bytes with metadata around what field of
|
||||
//: an x86 instruction they're for.
|
||||
//: Metadata for fields of an x86 instruction.
|
||||
//:
|
||||
//: The x86 instruction set is variable-length, and how a byte is interpreted
|
||||
//: affects later instruction boundaries. A lot of the pain in programming
|
||||
|
@ -27,6 +26,10 @@ put_new(Help, "instructions",
|
|||
:(before "End Help Contents")
|
||||
cerr << " instructions\n";
|
||||
|
||||
:(before "Running Test Program")
|
||||
transform(p);
|
||||
if (trace_contains_errors()) return;
|
||||
|
||||
:(code)
|
||||
void test_pack_immediate_constants() {
|
||||
run(
|
||||
|
|
Loading…
Reference in New Issue