Merge branch 'master' of tildegit.org:lucic71/dissertation

This commit is contained in:
lucic71 2022-12-18 13:24:39 +02:00
commit 1516884315
23 changed files with 5665 additions and 12 deletions

View File

@ -20,4 +20,33 @@ Quite clear that OpenBSD focused up until now on GCC and clang/LLVM.
Basically this is all the state of the art I need for this project.
However I will also probe the state of the art for other operating
systems too, mainly other BSDs, Linux, Plan 9, and other obscure
operating systems.
operating systems.
5. [The state of toolchains in NetBSD](https://www.cambus.net/the-state-of-toolchains-in-netbsd/)
Looks like NetBSD is more focused on GCC. What does 'portability'
from 'NetBSD is portable' mean then? Technically it should be:
'NetBSD is portable but it's portable as long as it is compiled
with GCC/LLVM'. I don't say that it should compile with every
existent compiler in this world, but at least change their
definition of portability.
6. [BSD Toolchain Project](https://wiki.freebsd.org/BSDToolchain)
This is somehow the natural path that the BSD project should have
made. They wrote so much of the software stack (user space programs,
network stack, etc) but were unable to write their own toolchain.
The effort of having a BSD toolchain is the missing part of the
BSD projects. However the direction they take with their toolchain
is different from mine. I don't have much experience with LLVM
architecture but it seems complex enough to not enter in my 'hackable
and clean architecture' model that I have in mind. At some point
I must define more clearly this terms. FreeBSD is focused on
offering high quality code, error messages, and a good overall
experience of using the toolchain. I'm not interested in that,
I want to keep the toolchain as simple as possible so that one
person can maintain it and can grasp it.
7. [C History](https://www.bell-labs.com/usr/dmr/www/chist.html)
8. [ANSI/POSIX Environment](http://doc.cat-v.org/plan_9/4th_edition/papers/ape)
9. [Some BSD compiler history](https://forums.freebsd.org/threads/compiler-question-why-bsd-disowned-bsds-c-gcc-and-clang.80387/)

View File

@ -16,4 +16,8 @@ read more carefully.
I don't know how interesting is this for me because it tackles bigger
issues of porting the system to a new architecture, including writing
new assembly code and stuff for the new hardware. What I am interested
in is the role of the compiler in this story.
in is the role of the compiler in this story.
5. [The Amsterdam Compiler Kit](https://tack.sourceforge.net/)
6. [Plan9 C Compiler](https://9p.io/sys/doc/comp.html)
7. [O3ONE](http://o3one.org/)
8. [Collapse OS](http://collapseos.org/)

View File

@ -3,4 +3,18 @@ and this caused it to be a security risk for the entire system.
Can I make the same case for the compiler? It does not run at the
same privilege level as sudo/doas but it has a critical role in
the system too. In the long run, it is the component that generates
all the code that runs on the processor.
all the code that runs on the processor.
2. At some moment in the near future I should probe the state of the
art regarding tiny compilers. I admit that I went too fast compiling
OpenBSD with tcc as there might be better solutions that are not so
GNU-dependent. Today I saw that kefircc or chibicc might be better
alternatives, even lcc, we will see.
3. Could I use the New Jersey approach as a motivation for my project?
I understand from Gabriel that software that is limited, but simple,
is more valuable that software full of functionalities. It would be
interesting to see what is the trend in compiler world and OSes
regarding the NJ vs MIT debate. My intuition is that people degenerated
from the notion of NJ, but they did not turn their faces to MIT, instead
the approach is to add functionalities without thinking about the
architectural concepts. Check [this](https://www.dreamsongs.com/WorseIsBetter.html)
for more details about NJ software approach.

View File

@ -82,3 +82,19 @@ that C might not be the best solution for this job.
[3] https://bellard.org/tcc/
[4] https://2019.asiabsdcon.org/proc-body-2019.pdf#page=13
[5] https://ieeexplore.ieee.org/document/1004368
I know that I should make another file for spontaneous ideas, but
I will put them here.
Let's say I have the following motivation: code that keeps the GNU
legacy is hard to verify in some sense (because of the multitude of
non-standard behaviours and of the complexity), what if we get rid
of that and then try to formally analyze the code in order to better
understand the security of OpenBSD. Would it be a good enough motivation?
I don't know. To better strenghten this motivation, I need to see what
are the disadvantages of using GNU software.
In a way I am pissed that I need to find this rocket-science motivations
in order to be able to run my porting project. Why can't I just simply
port OpenBSD to a new compiler and see what conclusions do I draw after
I finish.

13
TSW/Makefile Normal file
View File

@ -0,0 +1,13 @@
PROJECT=main
build:
pdflatex ${PROJECT}.tex
bibtex ${PROJECT}
pdflatex ${PROJECT}.tex
pdflatex ${PROJECT}.tex
show:
evince ${PROJECT}.pdf
clean:
rm -f *.aux *.blg *.out *.bbl *.log *.toc *.pdf

15
TSW/abstract.tex Normal file
View File

@ -0,0 +1,15 @@
\begin{abstract}
\noindent
The ISO C Standard added the undefined behavior notion as a mean to
portability. State-of-the-art compilers such as gcc and LLVM use it to
issue aggressive optimizations that break the semantics of the code.
\todo{or the intention of the progammer} We argue that the performance
impact of undefined behavior based optimizations in operating systems,
such as OpenBSD, is low. Furthermore they introduce more unobservable
and undocumented effects than performance advantages. To test our
hypothesis we aim to disable the above mentioned \todo{should i write UB
based optimiatoins instead?} optimizations and test the generated system
against real-life testbenches \todo{too generic}.
\end{abstract}
\noindent\keywords{compiler, optimization, operating system}\\

58
TSW/bib.bib Normal file
View File

@ -0,0 +1,58 @@
@inproceedings{wang2012undefined,
title={Undefined behavior: what happened to my code?},
author={Wang, Xi and Chen, Haogang and Cheung, Alvin and Jia, Zhihao
and Zeldovich, Nickolai and Kaashoek, M Frans},
booktitle={Proceedings of the Asia-Pacific Workshop on Systems},
pages={1--7},
year={2012}
}
@misc{checks_2008,
title={CERT/CC Vulnerability note vu162289},
url={https://www.kb.cert.org/vuls/id/162289/},
journal={VU162289 - C compilers may silently discard some wraparound
checks},
year={2008},
month={Apr}
}
@inproceedings{wang2013towards,
title={Towards optimization-safe systems: Analyzing the impact of
undefined behavior},
author={Wang, Xi and Zeldovich, Nickolai and Kaashoek, M Frans and
Solar-Lezama, Armando},
booktitle={Proceedings of the Twenty-Fourth ACM Symposium on Operating
Systems Principles},
pages={260--275},
year={2013}
}
@article{lomont2003fast,
title={Fast inverse square root},
author={Lomont, Chris},
journal={Tech-315 nical Report},
volume={32},
year={2003}
}
@misc{google_2015, title={BORINGCC},
url={https://groups.google.com/g/boring-crypto/c/48qa1kWignU/m/o8GGp2K1DAAJ},
journal={Google}, publisher={Google}, year={2015}, month={Dec}}
@inproceedings{ertl2015every,
title={What every compiler writer should know about programmers or
Optimization based on undefined behaviour hurts performance},
author={Ertl, M Anton},
booktitle={Kolloquium Programmiersprachen und Grundlagen der
Programmierung (KPS 2015)},
year={2015}
}
@misc{regehr_2014,
title={Proposal for a Friendly Dialect of C},
url={https://blog.regehr.org/archives/1180},
journal={Embedded in Academia},
author={Regehr, John},
year={2014},
month={Aug}
}

215
TSW/bibliography.bib Normal file
View File

@ -0,0 +1,215 @@
@inproceedings{tanaka,
title={Approaches to making software porting more productive},
author={Tanaka, Toshikiyo and Hakuta, M and Iwata, N and Ohminami, M},
booktitle={Proceedings of the 12th TRON Project international Symposium},
pages={73--85},
year={1995},
organization={IEEE}
}
@article{hakuta,
title={A study of software portability evaluation},
author={Hakuta, Mitsuari and Ohminami, Masato},
journal={Journal of Systems and Software},
volume={38},
number={2},
pages={145--154},
year={1997},
publisher={Elsevier}
}
@inproceedings{kanai,
title={A cost model for software conversion based on program characteristics and a converter effect},
author={Kanai, A and Furuyama, T and Takahashi, M},
booktitle={1992 Proceedings. The Sixteenth Annual International Computer Software and Applications Conference},
pages={63--64},
year={1992},
organization={IEEE Computer Society}
}
@incollection{mooney2004developing,
title={Developing portable software},
author={Mooney, James D},
booktitle={Information Technology},
pages={55--84},
year={2004},
publisher={Springer}
}
@article{capretz,
title={Bringing the human factor to software engineering},
author={Capretz, Luiz Fernando},
journal={IEEE software},
volume={31},
number={2},
pages={104--104},
year={2014},
publisher={IEEE}
}
@article{ejiogu,
title={A simple measure of software complexity},
author={Ejiogu, Lem O},
journal={ACM SIGPLAN Notices},
volume={20},
number={3},
pages={16--31},
year={1985},
publisher={ACM New York, NY, USA}
}
@article{mooney1990strategies,
title={Strategies for supporting application portability},
author={Mooney, James D.},
journal={Computer},
volume={23},
number={11},
pages={59--70},
year={1990},
publisher={IEEE}
}
@inproceedings{cho2011case,
title={Case Study on Installing a Porting Process for Embedded Operating System in a Small Team},
author={Cho, DongSeok and Bae, DooHwan},
booktitle={2011 Fifth International Conference on Secure Software Integration and Reliability Improvement-Companion},
pages={19--25},
year={2011},
organization={IEEE}
}
@misc{porquet2015,
title={Porting {Linux} to a new processor architecture, part 1: The basics},
howpublished="\url{https://lwn.net/Articles/654783/}",
journal={[LWN.net]},
author={Porquet, Joël},
year={2015},
month={Aug}
}
@article{bodenstab1984unix,
title={The {UNIX} system: {UNIX} operating system porting experiences},
author={Bodenstab, DE and Houghton, Thomas F and Kelleman, Keith A and Ronkin, George and Schan, Edward P},
journal={AT\&T Bell Laboratories Technical Journal},
volume={63},
number={8},
pages={1769--1790},
year={1984},
publisher={Nokia Bell Labs}
}
@misc{osdevcrossport,
title={Cross-Porting Software},
howpublished="\url{https://wiki.osdev.org/Cross-Porting\_Software}",
journal={[osdev.org]},
year={2019},
month={Sept}
}
@article{jolitz1990porting,
title={Porting {UNIX} to the 386: A practical approach},
author={Jolitz, William Frederick and Jolitz, Lynne Greer},
journal={Dr. Dobb's Journal},
volume={16},
number={1},
pages={16--46},
year={1990},
publisher={CMP Media, Inc.}
}
@article{frakes1995sixteen,
title={Sixteen questions about software reuse},
author={Frakes, William B and Fox, Christopher J},
journal={Communications of the ACM},
volume={38},
number={6},
pages={75--ff},
year={1995},
publisher={ACM New York, NY, USA}
}
@article{tanenbaum1978guidelines,
title={Guidelines for software portability},
author={Tanenbaum, Andrew S and Klint, Paul and Bohm, Wim},
journal={Software: Practice and Experience},
volume={8},
number={6},
pages={681--698},
year={1978},
publisher={Wiley Online Library}
}
@article{johnson1978unix,
title={{UNIX} time-sharing system: Portability of C programs and the {UNIX} system},
author={Johnson, Steven C and Ritchie, Dennis M},
journal={The Bell System Technical Journal},
volume={57},
number={6},
pages={2021--2048},
year={1978},
publisher={Nokia Bell Labs}
}
@article{xia2017measuring,
title={Measuring program comprehension: A large-scale field study with professionals},
author={Xia, Xin and Bao, Lingfeng and Lo, David and Xing, Zhenchang and Hassan, Ahmed E and Li, Shanping},
journal={IEEE Transactions on Software Engineering},
volume={44},
number={10},
pages={951--976},
year={2017},
publisher={IEEE}
}
@article{morgan1994controlling,
title={Controlling software development costs},
author={Morgan, Malcolm J},
journal={Industrial Management \& Data Systems},
year={1994},
publisher={MCB UP Ltd}
}
@article{boehm2000software,
title={Software development cost estimation approaches—A survey},
author={Boehm, Barry and Abts, Chris and Chulani, Sunita},
journal={Annals of software engineering},
volume={10},
number={1},
pages={177--205},
year={2000},
publisher={Springer}
}
@article{walli1995posix,
title={The POSIX family of standards},
author={Walli, Stephen R},
journal={StandardView},
volume={3},
number={1},
pages={11--17},
year={1995},
publisher={ACM New York, NY, USA}
}
@inproceedings{spencer1992ifdef,
title={\# ifdef considered harmful, or portability experience with C News},
author={Spencer, Henry and Collyer, Geoff},
booktitle={USENIX Summer 1992 Technical Conference (USENIX Summer 1992 Technical Conference)},
year={1992}
}
@misc{callahanopenbsd,
title={I ported the new Hare compiler to {OpenBSD}},
howpublished="\url={https://briancallahan.net/blog/20220427.html}",
journal={[Brian Robert Callahan]},
author={Callahan, B. R.},
year={2022},
month={April}
}
@article{lyonunixportability,
author = {Lyon, Tom},
year = {1977},
month = {08},
title = {Inter-{UNIX} Portability}
}

BIN
TSW/images/image.jpeg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

64
TSW/intro.tex Normal file
View File

@ -0,0 +1,64 @@
\section{Context and motivation}
The C89 standard provides a loose definition of undefined
behavior. This permits compiler implementations to
abuse the definition and use it as a mean to agressive optimizations
that break the semantics of the code. Code that works with a lower level
of optimization is broken when the optimization level is elavated.
Furthermore code that works on previous versions of the compiler is
suddenly broken in newer versions because the standard imposes no
requirements on undefined behavior.
This has created serious security problems throughout the
years~\cite{wang2012undefined,checks_2008}. A number of initiatives to
solve this problem were started from different
parties~\cite{google_2015,regehr_2014,wang2013towards} however the
problem still persists. The primary open source developer groups have
seized the unsteady definition of undefined behavior to justify
dangerous slient code transformations that break the intention of the
programmer.
This philosophy is very dangerous in terms of programming expresivity
and intentionality. The compiler has very little context of what the
developer wants to accomplish with a specific piece of code. For example
the compiler cannot make the distinction between a read from an
uninitialized memory region that might produce unspecified results and a
read from a memory mapped device that cannot be written in order to
initialize it. Or it cannot distinguish between an erroneous floating
pointer access to an integer variable and a smart method of computing an
arithmetic function~\cite{lomont2003fast}. The general principle is that
the developer has the responsability to decide what the code should to,
the job of the compiler is to translate the code into machine readable
instructions and to apply optimizations only when there is no risk of
losing developer intentionality.
The argument of the people that defend this kind of optimizations is
that C code that contains undefined behavior has no meaning and the
compiler is free to do various types of modifications on it. In Control
Theory terms, such a system is described by a low degree of
controllability and observability. Which is paradoxical in the
philosophy described above where the compiler forcefully takes the
responsability, from the developer, of generating relevant code. The
implications of this are that no meaningful engineering can be done in
this framework where processes inside a compiler cannot be understood
and analyzed.
Another argument that defends the aggresive optimizations view is that
code generated by these compilers runs faster on artificial benchmarks.
\todo{do some more research here} This does not necessarly hold for real
life benchmarks that differ in complexity from the artificial ones and
that make use of non-trivial code constructs. Ertl~\cite{ertl2015every}
makes an interesting observation regarding the performance of undefined
behavior based optimizations. He notes that source level changes buy
greater speedup factors than UB based optimizations for certain classes
of programs. While his research in this field is valuable, the
limitation of his work is that he draws conclusions based on SPECint
benchmarks.
The contribution of this work is that we try to analyze the speedup
factors for real life programs, such as operating systems, in particular
OpenBSD. \todo{motivate the choice of openbsd} By doing this we want to
provide a tradeoff analysis between the performance gained using UB
based optimizations and the risks of issuing them.
This paper is structured as follows. \todo{later}

5090
TSW/johd.bst Normal file

File diff suppressed because it is too large Load Diff

36
TSW/johd.sty Normal file
View File

@ -0,0 +1,36 @@
\setlength{\paperwidth}{21cm} % A4
\setlength{\paperheight}{29.7cm}% A4
\setlength\topmargin{-0.5cm}
\setlength\oddsidemargin{0cm}
\setlength\textheight{24.7cm}
\setlength\textwidth{16.0cm}
\setlength\columnsep{0.6cm}
\newlength\titlebox
\setlength\titlebox{5cm}
\setlength\headheight{5pt}
\setlength\headsep{0pt}
\pagestyle{plain}
\usepackage{color}
\usepackage[natbibapa]{apacite}
\usepackage{xurl}
\usepackage[colorlinks,citecolor=blue,urlcolor=blue, linkcolor=blue, bookmarks=false,hypertexnames=true]{hyperref}
\usepackage{url}
%\usepackage{libertine}
\usepackage{float}
\usepackage{graphicx}
\usepackage{doi} % hyperlink URLs
\renewcommand{\doi}{DOI:~}
\newcommand\outauthor{
\begin{tabular}[t]{c}
\bf\@author
\end{tabular}}
%Add keyword command
\providecommand{\keywords}[1]
{\small\textbf{Keywords:} #1
}
\providecommand{\authorroles}[1]
{\small\textbf{Author roles:} #1
}

64
TSW/main.tex Normal file
View File

@ -0,0 +1,64 @@
\documentclass{article}
\usepackage[english]{babel}
\usepackage[utf8]{inputenc}
\usepackage{johd}
\usepackage{color}
\newcommand{\todo}[1]{}
\renewcommand{\todo}[1]{{\color{red} TODO: {#1}}}
\title{Semantic Preservation of Undefined Behavior Based Optimizations}
\author{Popescu Lucian-Ioan \\
\small CS Department, Politehnica Univesity of Bucharest,
Romania \\
\small lucian.popescu187@gmail.com
}
\date{}
\begin{document}
\maketitle
\input{abstract}
\input{intro}
\section{Dataset description}
Here you can provide, if applicable, information about the dataset(s) whose creation, collection, management, access, processing or analysis have been discussed in this paper, following this schema:
\paragraph{Object name} Typically the name of the file or file set in the repository.
\paragraph{Format names and versions} E.g., ASCII, CSV, Autocad, EPS, JPEG, Excel, SQL, etc.
\paragraph{Creation dates} The start and end dates of when the data was created (YYYY-MM-DD).
\paragraph{Dataset creators} Please list anyone who helped to create the dataset (who may or may not be an author of the data paper), including their roles (using and affiliations).
\paragraph{Language} Languages used in the dataset (i.e., for variable names etc.).
\paragraph{License} The open license under which the data has been deposited (e.g., CC0).
\paragraph{Repository name} The name of the repository to which the data is uploaded. E.g., Figshare, Dataverse, etc.
\paragraph{Publication date} If already known, the date in which the dataset was published in the repository (YYYY-MM-DD).
\section{Method}
Describe the methods used in the study.
\section{Results and discussion}
Describe and discuss the results of the study.
\section{Implications/Applications}
Provide information about the implications of this research and/or how it can be applied.
\section*{Acknowledgements}
Please add any relevant acknowledgements to anyone else that assisted with the project in which the data was created but did not work directly on the data itself.
\section*{Funding Statement}
If the research resulted from funded research please list the funder and grant number here.
\section*{Competing interests}
If any of the authors have any competing interests then these must be declared. If there are no competing interests to declare then the following statement should be present: The author(s) has/have no competing interests to declare.
\bibliographystyle{johd}
\bibliography{bib}
\section*{Supplementary Files (optional)}
Any supplementary/additional files that should link to the main publication must be listed, with a corresponding number, title and option description. Ideally the supplementary files are also cited in the main text.
Note: supplementary files will not be typeset so they must be provided in their final form. They will be assigned a DOI and linked to from the publication.
\end{document}

1
TSW/relwork.tex Normal file
View File

@ -0,0 +1 @@
http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf

View File

@ -0,0 +1,13 @@
# locore0.S
I should have expected problems with the compiler, but not so early. The
first problem that I encountered here is that for 64 bit mode intel
there were no .code32 and .code16 opcodes. I don't know why they didn't
add this support. Maybe because no-one tried to use the compiler for
such a task.
However, I patched the compiler and the .S file is ok with the opcode.
We will see on the generated output if the machine code is correct.
Until then I still have to patch the compiler because there is no popfl
opcode. I don't even know what it does, but the problem needs to be
solved.

15
Tech/tcc/gnu.md Normal file
View File

@ -0,0 +1,15 @@
# GNU
TCC tries to keep a lot of compatibility with GCC for whatever reason. I
wonder what is the reason for that? I don't think that the author was
dumb enough to think that if the project kisses the ass of GCC it will
stay popular and relevant.
Just grep GCC in /tinycc and you will see how much dependency has the
project on GNU. Even the problem I solved earlier with .code32 in 64 bit
mode had to do with GAS, the GNU assembler. The makefile is written in
GNU make dialect, in /tinycc/configure gcc is the default compiler, etc,
etc.
I'm starting to think that maybe tcc is not the best solution for what
I'm trying to do.

View File

@ -77,7 +77,7 @@ CWARNFLAGS+= -Wno-address-of-packed-member -Wno-constant-conversion \
DEBUG?= -g
COPTIMIZE?= -O2
CFLAGS= ${DEBUG} ${CWARNFLAGS} ${CMACHFLAGS} ${COPTIMIZE} ${COPTS} ${PIPE}
AFLAGS= -D_LOCORE -x assembler-with-cpp ${CWARNFLAGS} ${CMACHFLAGS}
AFLAGS= -D_LOCORE ${CWARNFLAGS} ${CMACHFLAGS}
LINKFLAGS= -T ld.script -X --warn-common -nopie
HOSTCC?= ${CC}

View File

@ -805,6 +805,8 @@ LIBTCCAPI TCCState *tcc_new(void)
#endif
#ifdef TCC_TARGET_I386
s->seg_size = 32;
#elif defined(TCC_TARGET_X86_64)
s->seg_size = 64;
#endif
/* enable this if you want symbols with leading underscore on windows: */
#if defined TCC_TARGET_MACHO /* || defined TCC_TARGET_PE */

View File

@ -821,8 +821,9 @@ struct TCCState {
unsigned char has_text_addr;
addr_t text_addr; /* address of text section */
unsigned section_align; /* section alignment */
#ifdef TCC_TARGET_I386
int seg_size; /* 32. Can be 16 with i386 assembler (.code16) */
#if defined(TCC_TARGET_I386) || defined(TCC_TARGET_X86_64)
/* 16, 32, or 64 dependin on the .code{16,32,64} directive */
int seg_size;
#endif
char *tcc_lib_path; /* CONFIG_TCCDIR or -B option */

View File

@ -898,7 +898,7 @@ static void asm_parse_directive(TCCState *s1, int global)
next();
pop_section(s1);
break;
#ifdef TCC_TARGET_I386
#if defined(TCC_TARGET_I386) || defined(TCC_TARGET_X86_64)
case TOK_ASMDIR_code16:
{
next();
@ -913,9 +913,11 @@ static void asm_parse_directive(TCCState *s1, int global)
break;
#endif
#ifdef TCC_TARGET_X86_64
/* added for compatibility with GAS */
case TOK_ASMDIR_code64:
next();
{
next();
s1->seg_size = 64;
}
break;
#endif
default:

View File

@ -1027,7 +1027,7 @@ static void rt_exit(int code)
#ifndef _WIN32
# include <signal.h>
# ifndef __OpenBSD__
# include <sys/ucontext.h>
# include <sys/signal.h>
# endif
#else
# define ucontext_t CONTEXT

View File

@ -393,10 +393,11 @@
DEF_ASMDIR(endr)
DEF_ASMDIR(org)
DEF_ASMDIR(quad)
#if defined(TCC_TARGET_I386)
#if defined(TCC_TARGET_I386) || defined(TCC_TARGET_X86_64)
DEF_ASMDIR(code16)
DEF_ASMDIR(code32)
#elif defined(TCC_TARGET_X86_64)
#endif
#if defined(TCC_TARGET_X86_64)
DEF_ASMDIR(code64)
#endif
DEF_ASMDIR(short)