mu/034compute_segment_address.cc

//: ELF binaries have finicky rules about the precise alignment each segment
//: should start at. They depend on the amount of code in a program.
//: We shouldn't expect people to adjust segment addresses everytime they make
//: a change to their programs.
//: Let's start taking the given segment addresses as guidelines, and adjust
//: them as necessary.
//: This gives up a measure of control in placing code and data.

void test_segment_name() {
  run(
      "== code 0x09000000\n"
      "05/add-to-EAX  0x0d0c0b0a/imm32\n"
      // code starts at 0x09000000 + p_offset, which is 0x54 for a single-segment binary
  );
  CHECK_TRACE_CONTENTS(
      "load: 0x09000054 -> 05\n"
      "load: 0x09000055 -> 0a\n"
      "load: 0x09000056 -> 0b\n"
      "load: 0x09000057 -> 0c\n"
      "load: 0x09000058 -> 0d\n"
      "run: add imm32 0x0d0c0b0a to EAX\n"
      "run: storing 0x0d0c0b0a\n"
  );
}

//: compute segment address

:(before "End Level-2 Transforms")
Transform.push_back(compute_segment_starts);

:(code)
void compute_segment_starts(program& p) {
  trace(3, "transform") << "-- compute segment addresses" << end();
  uint32_t p_offset = /*size of ehdr*/0x34 + SIZE(p.segments)*0x20/*size of each phdr*/;
  for (size_t i = 0;  i < p.segments.size();  ++i) {
    segment& curr = p.segments.at(i);
    if (curr.start >= 0x08000000) {
      // valid address for user space, so assume we're creating a real ELF binary, not just running a test
      curr.start &= 0xfffff000;  // same number of zeros as the p_align used when emitting the ELF binary
      curr.start |= (p_offset & 0xfff);
      trace(99, "transform") << "segment " << i << " begins at address 0x" << HEXWORD << curr.start << end();
    }
    p_offset += size_of(curr);
    assert(p_offset < SEGMENT_ALIGNMENT);  // for now we get less and less available space in each successive segment
  }
}

uint32_t size_of(const segment& s) {
  uint32_t sum = 0;
  for (int i = 0;  i < SIZE(s.lines);  ++i)
    sum += num_bytes(s.lines.at(i));
  return sum;
}

// Assumes all bitfields are packed.
uint32_t num_bytes(const line& inst) {
  uint32_t sum = 0;
  for (int i = 0;  i < SIZE(inst.words);  ++i)
    sum += size_of(inst.words.at(i));
  return sum;
}

int size_of(const word& w) {
  if (has_operand_metadata(w, "disp32") || has_operand_metadata(w, "imm32"))
    return 4;
  else if (has_operand_metadata(w, "disp16"))
    return 2;
  // End size_of(word w) Special-cases
  else
    return 1;
}

//: Dependencies:
//: - We'd like to compute segment addresses before setting up global variables,
//:   because computing addresses for global variables requires knowing where
//:   the data segment starts.
//: - We'd like to finish expanding labels before computing segment addresses,
//:   because it would make computing the sizes of segments more self-contained
//:   (num_bytes).
//:
//: Decision: compute segment addresses before expanding labels, by being
//: aware in this layer of certain operand types that will eventually occupy
//: multiple bytes.
//:
//: The layer to expand labels later hooks into num_bytes() to teach this
//: layer that labels occupy zero space in the binary.
switch to new syntax for segment headers in C++ 2019-05-18 07:00:18 +00:00			`//: ELF binaries have finicky rules about the precise alignment each segment`
			`//: should start at. They depend on the amount of code in a program.`
			`//: We shouldn't expect people to adjust segment addresses everytime they make`
			`//: a change to their programs.`
			`//: Let's start taking the given segment addresses as guidelines, and adjust`
			`//: them as necessary.`
4532 Make segment names a separate transform. 2018-09-02 03:37:54 +00:00			`//: This gives up a measure of control in placing code and data.`

5001 - drop the :(scenario) DSL I've been saying for a while[1][2][3] that adding extra abstractions makes things harder for newcomers, and adding new notations doubly so. And then I notice this DSL in my own backyard. Makes me feel like a hypocrite. [1] https://news.ycombinator.com/item?id=13565743#13570092 [2] https://lobste.rs/s/to8wpr/configuration_files_are_canary_warning [3] https://lobste.rs/s/mdmcdi/little_languages_by_jon_bentley_1986#c_3miuf2 The implementation of the DSL was also highly hacky: a) It was happening in the tangle/ tool, but was utterly unrelated to tangling layers. b) There were several persnickety constraints on the different kinds of lines and the specific order they were expected in. I kept finding bugs where the translator would silently do the wrong thing. Or the error messages sucked, and readers may be stuck looking at the generated code to figure out what happened. Fixing error messages would require a lot more code, which is one of my arguments against DSLs in the first place: they may be easy to implement, but they're hard to design to go with the grain of the underlying platform. They require lots of iteration. Is that effort worth prioritizing in this project? On the other hand, the DSL did make at least some readers' life easier, the ones who weren't immediately put off by having to learn a strange syntax. There were fewer quotes to parse, fewer backslash escapes. Anyway, since there are also people who dislike having to put up with strange syntaxes, we'll call that consideration a wash and tear this DSL out. --- This commit was sheer drudgery. Hopefully it won't need to be redone with a new DSL because I grow sick of backslashes. 2019-03-13 01:56:55 +00:00			`void test_segment_name() {`
			`run(`
switch to new syntax for segment headers in C++ 2019-05-18 07:00:18 +00:00			`"== code 0x09000000\n"`
5001 - drop the :(scenario) DSL I've been saying for a while[1][2][3] that adding extra abstractions makes things harder for newcomers, and adding new notations doubly so. And then I notice this DSL in my own backyard. Makes me feel like a hypocrite. [1] https://news.ycombinator.com/item?id=13565743#13570092 [2] https://lobste.rs/s/to8wpr/configuration_files_are_canary_warning [3] https://lobste.rs/s/mdmcdi/little_languages_by_jon_bentley_1986#c_3miuf2 The implementation of the DSL was also highly hacky: a) It was happening in the tangle/ tool, but was utterly unrelated to tangling layers. b) There were several persnickety constraints on the different kinds of lines and the specific order they were expected in. I kept finding bugs where the translator would silently do the wrong thing. Or the error messages sucked, and readers may be stuck looking at the generated code to figure out what happened. Fixing error messages would require a lot more code, which is one of my arguments against DSLs in the first place: they may be easy to implement, but they're hard to design to go with the grain of the underlying platform. They require lots of iteration. Is that effort worth prioritizing in this project? On the other hand, the DSL did make at least some readers' life easier, the ones who weren't immediately put off by having to learn a strange syntax. There were fewer quotes to parse, fewer backslash escapes. Anyway, since there are also people who dislike having to put up with strange syntaxes, we'll call that consideration a wash and tear this DSL out. --- This commit was sheer drudgery. Hopefully it won't need to be redone with a new DSL because I grow sick of backslashes. 2019-03-13 01:56:55 +00:00			`"05/add-to-EAX 0x0d0c0b0a/imm32\n"`
switch to new syntax for segment headers in C++ 2019-05-18 07:00:18 +00:00			`// code starts at 0x09000000 + p_offset, which is 0x54 for a single-segment binary`
5001 - drop the :(scenario) DSL I've been saying for a while[1][2][3] that adding extra abstractions makes things harder for newcomers, and adding new notations doubly so. And then I notice this DSL in my own backyard. Makes me feel like a hypocrite. [1] https://news.ycombinator.com/item?id=13565743#13570092 [2] https://lobste.rs/s/to8wpr/configuration_files_are_canary_warning [3] https://lobste.rs/s/mdmcdi/little_languages_by_jon_bentley_1986#c_3miuf2 The implementation of the DSL was also highly hacky: a) It was happening in the tangle/ tool, but was utterly unrelated to tangling layers. b) There were several persnickety constraints on the different kinds of lines and the specific order they were expected in. I kept finding bugs where the translator would silently do the wrong thing. Or the error messages sucked, and readers may be stuck looking at the generated code to figure out what happened. Fixing error messages would require a lot more code, which is one of my arguments against DSLs in the first place: they may be easy to implement, but they're hard to design to go with the grain of the underlying platform. They require lots of iteration. Is that effort worth prioritizing in this project? On the other hand, the DSL did make at least some readers' life easier, the ones who weren't immediately put off by having to learn a strange syntax. There were fewer quotes to parse, fewer backslash escapes. Anyway, since there are also people who dislike having to put up with strange syntaxes, we'll call that consideration a wash and tear this DSL out. --- This commit was sheer drudgery. Hopefully it won't need to be redone with a new DSL because I grow sick of backslashes. 2019-03-13 01:56:55 +00:00			`);`
			`CHECK_TRACE_CONTENTS(`
			`"load: 0x09000054 -> 05\n"`
			`"load: 0x09000055 -> 0a\n"`
			`"load: 0x09000056 -> 0b\n"`
			`"load: 0x09000057 -> 0c\n"`
			`"load: 0x09000058 -> 0d\n"`
flag tests for opcode 05 2019-05-13 19:10:23 +00:00			`"run: add imm32 0x0d0c0b0a to EAX\n"`
5001 - drop the :(scenario) DSL I've been saying for a while[1][2][3] that adding extra abstractions makes things harder for newcomers, and adding new notations doubly so. And then I notice this DSL in my own backyard. Makes me feel like a hypocrite. [1] https://news.ycombinator.com/item?id=13565743#13570092 [2] https://lobste.rs/s/to8wpr/configuration_files_are_canary_warning [3] https://lobste.rs/s/mdmcdi/little_languages_by_jon_bentley_1986#c_3miuf2 The implementation of the DSL was also highly hacky: a) It was happening in the tangle/ tool, but was utterly unrelated to tangling layers. b) There were several persnickety constraints on the different kinds of lines and the specific order they were expected in. I kept finding bugs where the translator would silently do the wrong thing. Or the error messages sucked, and readers may be stuck looking at the generated code to figure out what happened. Fixing error messages would require a lot more code, which is one of my arguments against DSLs in the first place: they may be easy to implement, but they're hard to design to go with the grain of the underlying platform. They require lots of iteration. Is that effort worth prioritizing in this project? On the other hand, the DSL did make at least some readers' life easier, the ones who weren't immediately put off by having to learn a strange syntax. There were fewer quotes to parse, fewer backslash escapes. Anyway, since there are also people who dislike having to put up with strange syntaxes, we'll call that consideration a wash and tear this DSL out. --- This commit was sheer drudgery. Hopefully it won't need to be redone with a new DSL because I grow sick of backslashes. 2019-03-13 01:56:55 +00:00			`"run: storing 0x0d0c0b0a\n"`
			`);`
			`}`
4532 Make segment names a separate transform. 2018-09-02 03:37:54 +00:00
4631 2018-10-01 15:47:15 +00:00			`//: compute segment address`

4532 Make segment names a separate transform. 2018-09-02 03:37:54 +00:00			`:(before "End Level-2 Transforms")`
			`Transform.push_back(compute_segment_starts);`

			`:(code)`
			`void compute_segment_starts(program& p) {`
4987 - support `browse_trace` tool in SubX I've extracted it into a separate binary, independent of my Mu prototype. I also cleaned up my tracing layer to be a little nicer. Major improvements: - Realized that incremental tracing really ought to be the default. And to minimize printing traces to screen. - Finally figured out how to combine layers and call stack frames in a single dimension of depth. The answer: optimize for the experience of `browse_trace`. Instructions occupy a range of depths based on their call stack frame, and minor details of an instruction lie one level deeper in each case. Other than that, I spent some time adjusting levels everywhere to make `browse_trace` useful. 2019-02-25 08:17:46 +00:00			`trace(3, "transform") << "-- compute segment addresses" << end();`
4532 Make segment names a separate transform. 2018-09-02 03:37:54 +00:00			`uint32_t p_offset = /size of ehdr/0x34 + SIZE(p.segments)0x20/size of each phdr*/;`
			`for (size_t i = 0; i < p.segments.size(); ++i) {`
			`segment& curr = p.segments.at(i);`
switch to new syntax for segment headers in C++ 2019-05-18 07:00:18 +00:00			`if (curr.start >= 0x08000000) {`
			`// valid address for user space, so assume we're creating a real ELF binary, not just running a test`
			`curr.start &= 0xfffff000; // same number of zeros as the p_align used when emitting the ELF binary`
5454 Bugfix fifteen -- on the C++ side. 2019-07-23 03:04:14 +00:00			`curr.start \|= (p_offset & 0xfff);`
4550 2018-09-20 20:42:57 +00:00			`trace(99, "transform") << "segment " << i << " begins at address 0x" << HEXWORD << curr.start << end();`
4535 - support for global variable names 2018-09-02 06:03:50 +00:00			`}`
			`p_offset += size_of(curr);`
4761 Bugfix: I forgot about ELF segment offsets when implementing VMAs. Eventually segments grew large enough that I started seeing overlaps. 2018-11-23 08:21:41 +00:00			`assert(p_offset < SEGMENT_ALIGNMENT); // for now we get less and less available space in each successive segment`
4532 Make segment names a separate transform. 2018-09-02 03:37:54 +00:00			`}`
			`}`
4535 - support for global variable names 2018-09-02 06:03:50 +00:00
			`uint32_t size_of(const segment& s) {`
			`uint32_t sum = 0;`
			`for (int i = 0; i < SIZE(s.lines); ++i)`
			`sum += num_bytes(s.lines.at(i));`
			`return sum;`
			`}`

			`// Assumes all bitfields are packed.`
			`uint32_t num_bytes(const line& inst) {`
			`uint32_t sum = 0;`
4754 - allow data segment to refer to variables 2018-11-20 05:16:56 +00:00			`for (int i = 0; i < SIZE(inst.words); ++i)`
			`sum += size_of(inst.words.at(i));`
4535 - support for global variable names 2018-09-02 06:03:50 +00:00			`return sum;`
			`}`
4565 2018-09-21 17:06:17 +00:00
4754 - allow data segment to refer to variables 2018-11-20 05:16:56 +00:00			`int size_of(const word& w) {`
			`if (has_operand_metadata(w, "disp32") \|\| has_operand_metadata(w, "imm32"))`
			`return 4;`
4771 I stopped handling disp16 at some point, and using instructions with such an operand messes up segment alignment when generating ELF binaries. I don't test my ELF generation. This is a sign that maybe I should start. 2018-11-25 04:53:32 +00:00			`else if (has_operand_metadata(w, "disp16"))`
			`return 2;`
4754 - allow data segment to refer to variables 2018-11-20 05:16:56 +00:00			`// End size_of(word w) Special-cases`
			`else`
			`return 1;`
			`}`

4565 2018-09-21 17:06:17 +00:00			`//: Dependencies:`
			`//: - We'd like to compute segment addresses before setting up global variables,`
			`//: because computing addresses for global variables requires knowing where`
			`//: the data segment starts.`
			`//: - We'd like to finish expanding labels before computing segment addresses,`
			`//: because it would make computing the sizes of segments more self-contained`
			`//: (num_bytes).`
			`//:`
			`//: Decision: compute segment addresses before expanding labels, by being`
			`//: aware in this layer of certain operand types that will eventually occupy`
			`//: multiple bytes.`
			`//:`
			`//: The layer to expand labels later hooks into num_bytes() to teach this`
			`//: layer that labels occupy zero space in the binary.`