mu/034compute_segment_address.cc

87 lines
3.1 KiB
C++
Raw Normal View History

//: ELF binaries have finicky rules about the precise alignment each segment
//: should start at. They depend on the amount of code in a program.
//: We shouldn't expect people to adjust segment addresses everytime they make
//: a change to their programs.
//: Let's start taking the given segment addresses as guidelines, and adjust
//: them as necessary.
//: This gives up a measure of control in placing code and data.
5001 - drop the :(scenario) DSL I've been saying for a while[1][2][3] that adding extra abstractions makes things harder for newcomers, and adding new notations doubly so. And then I notice this DSL in my own backyard. Makes me feel like a hypocrite. [1] https://news.ycombinator.com/item?id=13565743#13570092 [2] https://lobste.rs/s/to8wpr/configuration_files_are_canary_warning [3] https://lobste.rs/s/mdmcdi/little_languages_by_jon_bentley_1986#c_3miuf2 The implementation of the DSL was also highly hacky: a) It was happening in the tangle/ tool, but was utterly unrelated to tangling layers. b) There were several persnickety constraints on the different kinds of lines and the specific order they were expected in. I kept finding bugs where the translator would silently do the wrong thing. Or the error messages sucked, and readers may be stuck looking at the generated code to figure out what happened. Fixing error messages would require a lot more code, which is one of my arguments against DSLs in the first place: they may be easy to implement, but they're hard to design to go with the grain of the underlying platform. They require lots of iteration. Is that effort worth prioritizing in this project? On the other hand, the DSL did make at least some readers' life easier, the ones who weren't immediately put off by having to learn a strange syntax. There were fewer quotes to parse, fewer backslash escapes. Anyway, since there are also people who dislike having to put up with strange syntaxes, we'll call that consideration a wash and tear this DSL out. --- This commit was sheer drudgery. Hopefully it won't need to be redone with a new DSL because I grow sick of backslashes.
2019-03-13 01:56:55 +00:00
void test_segment_name() {
run(
"== code 0x09000000\n"
5001 - drop the :(scenario) DSL I've been saying for a while[1][2][3] that adding extra abstractions makes things harder for newcomers, and adding new notations doubly so. And then I notice this DSL in my own backyard. Makes me feel like a hypocrite. [1] https://news.ycombinator.com/item?id=13565743#13570092 [2] https://lobste.rs/s/to8wpr/configuration_files_are_canary_warning [3] https://lobste.rs/s/mdmcdi/little_languages_by_jon_bentley_1986#c_3miuf2 The implementation of the DSL was also highly hacky: a) It was happening in the tangle/ tool, but was utterly unrelated to tangling layers. b) There were several persnickety constraints on the different kinds of lines and the specific order they were expected in. I kept finding bugs where the translator would silently do the wrong thing. Or the error messages sucked, and readers may be stuck looking at the generated code to figure out what happened. Fixing error messages would require a lot more code, which is one of my arguments against DSLs in the first place: they may be easy to implement, but they're hard to design to go with the grain of the underlying platform. They require lots of iteration. Is that effort worth prioritizing in this project? On the other hand, the DSL did make at least some readers' life easier, the ones who weren't immediately put off by having to learn a strange syntax. There were fewer quotes to parse, fewer backslash escapes. Anyway, since there are also people who dislike having to put up with strange syntaxes, we'll call that consideration a wash and tear this DSL out. --- This commit was sheer drudgery. Hopefully it won't need to be redone with a new DSL because I grow sick of backslashes.
2019-03-13 01:56:55 +00:00
"05/add-to-EAX 0x0d0c0b0a/imm32\n"
// code starts at 0x09000000 + p_offset, which is 0x54 for a single-segment binary
5001 - drop the :(scenario) DSL I've been saying for a while[1][2][3] that adding extra abstractions makes things harder for newcomers, and adding new notations doubly so. And then I notice this DSL in my own backyard. Makes me feel like a hypocrite. [1] https://news.ycombinator.com/item?id=13565743#13570092 [2] https://lobste.rs/s/to8wpr/configuration_files_are_canary_warning [3] https://lobste.rs/s/mdmcdi/little_languages_by_jon_bentley_1986#c_3miuf2 The implementation of the DSL was also highly hacky: a) It was happening in the tangle/ tool, but was utterly unrelated to tangling layers. b) There were several persnickety constraints on the different kinds of lines and the specific order they were expected in. I kept finding bugs where the translator would silently do the wrong thing. Or the error messages sucked, and readers may be stuck looking at the generated code to figure out what happened. Fixing error messages would require a lot more code, which is one of my arguments against DSLs in the first place: they may be easy to implement, but they're hard to design to go with the grain of the underlying platform. They require lots of iteration. Is that effort worth prioritizing in this project? On the other hand, the DSL did make at least some readers' life easier, the ones who weren't immediately put off by having to learn a strange syntax. There were fewer quotes to parse, fewer backslash escapes. Anyway, since there are also people who dislike having to put up with strange syntaxes, we'll call that consideration a wash and tear this DSL out. --- This commit was sheer drudgery. Hopefully it won't need to be redone with a new DSL because I grow sick of backslashes.
2019-03-13 01:56:55 +00:00
);
CHECK_TRACE_CONTENTS(
"load: 0x09000054 -> 05\n"
"load: 0x09000055 -> 0a\n"
"load: 0x09000056 -> 0b\n"
"load: 0x09000057 -> 0c\n"
"load: 0x09000058 -> 0d\n"
2019-05-13 19:10:23 +00:00
"run: add imm32 0x0d0c0b0a to EAX\n"
5001 - drop the :(scenario) DSL I've been saying for a while[1][2][3] that adding extra abstractions makes things harder for newcomers, and adding new notations doubly so. And then I notice this DSL in my own backyard. Makes me feel like a hypocrite. [1] https://news.ycombinator.com/item?id=13565743#13570092 [2] https://lobste.rs/s/to8wpr/configuration_files_are_canary_warning [3] https://lobste.rs/s/mdmcdi/little_languages_by_jon_bentley_1986#c_3miuf2 The implementation of the DSL was also highly hacky: a) It was happening in the tangle/ tool, but was utterly unrelated to tangling layers. b) There were several persnickety constraints on the different kinds of lines and the specific order they were expected in. I kept finding bugs where the translator would silently do the wrong thing. Or the error messages sucked, and readers may be stuck looking at the generated code to figure out what happened. Fixing error messages would require a lot more code, which is one of my arguments against DSLs in the first place: they may be easy to implement, but they're hard to design to go with the grain of the underlying platform. They require lots of iteration. Is that effort worth prioritizing in this project? On the other hand, the DSL did make at least some readers' life easier, the ones who weren't immediately put off by having to learn a strange syntax. There were fewer quotes to parse, fewer backslash escapes. Anyway, since there are also people who dislike having to put up with strange syntaxes, we'll call that consideration a wash and tear this DSL out. --- This commit was sheer drudgery. Hopefully it won't need to be redone with a new DSL because I grow sick of backslashes.
2019-03-13 01:56:55 +00:00
"run: storing 0x0d0c0b0a\n"
);
}
2018-10-01 15:47:15 +00:00
//: compute segment address
:(before "End Level-2 Transforms")
Transform.push_back(compute_segment_starts);
:(code)
void compute_segment_starts(program& p) {
trace(3, "transform") << "-- compute segment addresses" << end();
uint32_t p_offset = /*size of ehdr*/0x34 + SIZE(p.segments)*0x20/*size of each phdr*/;
for (size_t i = 0; i < p.segments.size(); ++i) {
segment& curr = p.segments.at(i);
if (curr.start >= 0x08000000) {
// valid address for user space, so assume we're creating a real ELF binary, not just running a test
curr.start &= 0xfffff000; // same number of zeros as the p_align used when emitting the ELF binary
curr.start |= (p_offset & 0xfff);
2018-09-20 20:42:57 +00:00
trace(99, "transform") << "segment " << i << " begins at address 0x" << HEXWORD << curr.start << end();
}
p_offset += size_of(curr);
assert(p_offset < SEGMENT_ALIGNMENT); // for now we get less and less available space in each successive segment
}
}
uint32_t size_of(const segment& s) {
uint32_t sum = 0;
for (int i = 0; i < SIZE(s.lines); ++i)
sum += num_bytes(s.lines.at(i));
return sum;
}
// Assumes all bitfields are packed.
uint32_t num_bytes(const line& inst) {
uint32_t sum = 0;
for (int i = 0; i < SIZE(inst.words); ++i)
sum += size_of(inst.words.at(i));
return sum;
}
2018-09-21 17:06:17 +00:00
int size_of(const word& w) {
if (has_operand_metadata(w, "disp32") || has_operand_metadata(w, "imm32"))
return 4;
else if (has_operand_metadata(w, "disp16"))
return 2;
// End size_of(word w) Special-cases
else
return 1;
}
2018-09-21 17:06:17 +00:00
//: Dependencies:
//: - We'd like to compute segment addresses before setting up global variables,
//: because computing addresses for global variables requires knowing where
//: the data segment starts.
//: - We'd like to finish expanding labels before computing segment addresses,
//: because it would make computing the sizes of segments more self-contained
//: (num_bytes).
//:
//: Decision: compute segment addresses before expanding labels, by being
//: aware in this layer of certain operand types that will eventually occupy
//: multiple bytes.
//:
//: The layer to expand labels later hooks into num_bytes() to teach this
//: layer that labels occupy zero space in the binary.